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CHAPTER 1 



Introduction: 

What Is Empirical Musicology? 

Nicholas Cook and Eric Clarke 

I believe there's a real world out there, because not all oj my fantasies work. 



The words are those of the composer and music theorist Benjamin Boretz (1977: 
242), and they articulate a level at which it is hard to envisage work in musicology 
or theory that is not empirical. Any archival musicologist, after all, knows that facts 
can be very hard indeed — though, as we shall see, it would be more correct to say 
that facts are a matter of interpretation and that it is the data that are hard. In the 
same way, different analysts' Schenkerian interpretations of a given passage may well 
differ (and Schenkerian analysis is supported, or at least surrounded, by a discourse 
that is largely speculative if not at times metaphysical), but they are closely regulated 
by the score on which they are based; indeed the trial-and-error process by which 
music-analytical interpretations develop, with observation leading to interpretation 
and interpretation in turn guiding observation, is a model of close, empirically reg- 
ulated reading. Theorists and composers have both on occasion invoked the lan- 
guage of experimentation, too; for example, Marion Guck (1994: 62) has described 
her analyses as "(thought) experiments," but the best known of such invocations is 
Milton Babbitt's (1972a: 148) claim that "every musical composition justifiably may 
be regarded as an experiment, the embodiment of hypotheses as to certain specific 
conditions of musical coherence." 

In short, there is no useful distinction to be drawn between empirical and non- 
empirical musicology, because there can be no such thing as a truly non-empirical 
musicology; what is at issue is the extent to which musicological discourse is 
grounded on empirical observation, and conversely the extent to which observation 
is regulated by discourse. The idea of regulation is essential in this context. Michel 
Foucault (1970) has illustrated this point through reference to the comparative illus- 
trations of human and bird skeletons published in 1555 by Pierre Belon: as Foucault 
says, these illustrations look like the products of nineteenth-century comparative 
anatomy, but the resemblance is little more than chance, because the interpretational 
grids of sixteenth-century and of nineteenth-century thought are so different. 1 In 
other words, what we generally think of as empirically-based knowledge — as sci- 
ence — depends not only on observation but also on the incorporation of observation 
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within patterns of investigation involving generalization and explanation. (That is 
what turns data into facts.) It also depends on the more fundamental criterion of 
replication: if an observation is to be regarded as trustworthy, it should be possible to 
make it on different occasions, and it should be possible for different people to make 
it. The issue, then, is whether musicology fulfils these conditions — whether, in 
short, its interface with Boretz's "real world out there" is as well managed and under- 
stood as it might be. 

Musicologists are certainly aware of the distinction between data and facts (see, 
e.g., Dahlhaus 1983: 34). Researchers with a background in the hard or social sci- 
ences, however, might well question whether most musicologists are sufficiently 
aware of the methodological consequences of this distinction. Like most humanities 
scholars, musicologists are prone to build interpretations on very small data sets or 
even on single instances, and the less the evidence that has survived from the past, 
the stronger this tendency will be. In the study of medieval music, for instance, so 
little documentation has survived that what does exist often lacks a secure context, 
and under such conditions it becomes impossible to avoid circular argument: if your 
starting point is that there are hidden meanings in fourteenth-century motets, then 
you are bound to deduce that there were sophisticated contemporary audiences ca- 
pable of appreciating them, and this then becomes evidence for the hidden mean- 
ings (Leech-Wilkinson 2000). Without sufficient evidence to prove or disprove the 
hypothesis, it is simply not possible to cut through the circle; the problem is en- 
demic. It follows that, as David Huron (1999) has pointed out, the issue is not one 
of good or bad methodology, but of what is viable in data-poor as against data-rich 
fields. In most (though not all) of the physical and social sciences, it is possible 
through systematic programs of observation to acquire large bodies of data, which 
may then be manipulated statistically and subjected to measures of statistical signif- 
icance. But in fields like medieval music, this is simply not possible, and so argu- 
ments based on statistically insignificant samples or single instances are inevitable; 
a further result, arguably, is that scholars become wedded to their interpretive hy- 
potheses, since there is rarely the evidence to conclusively overthrow them, result- 
ing in a degree of conservatism that can easily turn into dogmatism. 2 That is the 
price that has to be paid for working in data-poor fields. 

While this is no argument for abandoning such fields, there would be grounds 
for legitimate criticism if musicologists working in data-rich fields did not take full 
advantage of the methods available under such conditions, instead restricting them- 
selves to traditional "humanities" approaches developed for data-poor fields — and 
one of the messages of this book is that musicology is or could be, in many instances, 
a significantly "data richer" field than we generally give it credit for. (More bluntly, 
there may be many musicological certainties that would not survive a systematic en- 
gagement with the available data.) The same applies to a second characteristic of 
most work in musicology, which is its retrospective nature. One of the obvious de- 
terminants of historical method is that you can't run history again under different 
conditions and see how it turns out. (What would the history of nineteenth-century 
music have been like if Mozart had died in 1845, at the age of 89?) Once again, there 
is no point complaining about this; it is simply how history is. You could reasonably 
complain, however, if there were areas of musicology in which prospective work — 
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crudely, making predictions and then testing them — would be possible, but was not 
carried out, perhaps as a result of the discipline's predominantly historical self- 
image. 

Empirical musicology to summarize, can be thought of as musicology that em- 
bodies a principled awareness of both the potential to engage with large bodies of 
relevant data, and the appropriate methods for achieving this; adopting this term 
does not deny the self-evidently empirical dimension of all musicology, but draws 
attention to the potential of a range of empirical approaches to music that is, as yet, 
not widely disseminated within the discipline. And just as it is not a matter of em- 
pirical versus non-empirical, so we do not wish to draw an either/or distinction be- 
tween the objective and the subjective. In order to illustrate this point we may re- 
turn to Guck and her "(thought) experiments." She coins this term with specific 
reference to Hans David's description of the C\> in bar 53 of the second movement 
from Mozart's G minor Symphony K 550 as "unexpected" (a description that Babbitt 
had in the 1960s characterized as an "incorrigible personal statement"), 3 and goes 
on to outline a way of thinking about the movement that explains this "unexpected" 
quality: she likens the C\> to an "indomitable immigrant" (1994: 72), at its first ap- 
pearance conspicuously foreign to the tonal environment of the movement, but 
eventually assimilated within it and even ultimately serving to transform it ("Cl> has 
succeeded to the leadership of its community" [1994: 70]). In short, she describes a 
way in which she can hear the music, and invites her reader to share her experience. 

In what sense can this be properly called an experiment, "thought" or otherwise? 
There is no null hypothesis, 4 no control or randomization of potentially extraneous 
variables, no control group. To say that, however, is not to say — as a perhaps too ca- 
sual reading of Babbitt might imply — that it is an exercise in uncontrolled, purely 
subjective speculation. If Guck's frankly fictive account of the immigrant C\> articu- 
lates a way of hearing the music that other people can share, then it can be regarded 
as a discovery procedure resulting in a replication of experience , and hence in a mea- 
sure of intersubjective agreement. And indeed resort to measures which are replicable 
but not necessarily definable in objective terms is quite normal in musicology. An 
example is the coding of folk songs employed in Alan Lomax's Cantometrics project, 
which involved a large number of researchers scoring recorded songs for such qual- 
ities as nasality: Lomax explained that we don't know how to define nasality in ob- 
jective or productional terms — but what matters, he said, is that in practice there is 
"good consensus on the presence of great nasality or its relative absence" (1968: 72). 

Guck's and Lomax's arguments, however, would have cut little ice within the 
culture of objectivity that characterized much postwar musicology and theory (and 
in which some of the origins of empirical musicology are to be found). Rather like 
the compositions at the contemporaneous Darmstadt Ferienkurse fur neue Musik, 
such work reflected a distrust of conventional approaches and even terminologies; 
the traditional language of musicology seemed hopelessly compromised by latent 
subjectivity, and so it seemed necessary to reduce analytical statements to objective 
propositions or, if this couldn't be done, to abjure them. Arthur Komar (1971: 11) 
referred to "designing a set of rigorous terms for music" as "the serious but unful- 
filled goal of current music theory," but the definitive statement, dating from 10 
years earlier, was once again Babbitt's ("there is but one kind of language, one kind 
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of method for the verbal formulation of 'concepts' and the verbal analysis of such 
formulations: 'scientific' language and 'scientific' method" [Babbitt 1972c: 3, but 
originally published in 1961]). And when translated into practice, the results in- 
cluded attempts to implement existing analytical approaches as computer programs 
(Kassler 1967), or to express new ones in the forms of symbolic logic (Boretz 1970) 
or machine-readable algorithms, as in the case Allen Forte's (1973) pitch class (pc) 
set theory. They also included some picturesque attempts to extract rigorous content 
from such "incorrigible" ordinary-language statements as the claim in Cobbett's Cy- 
clopedic Survey of Chamber Music that "the spirit of nationalism is felt in all of the best 
chamber music": Fred Hofstetter, a pioneer of humanities computing, reduced this 
to the more straightforward claim that "composers differ from one another as a func- 
tion of their nationalities" (Hofstetter 1979: 105), selected a representative sample 
of chamber music by French, German, Czechoslovakian, and Russian composers, 
defined a stylistic measure (the relative frequency of different intervals), and did the 
sums. His conclusion, perhaps unsurprisingly, was that there are indeed differences 
between national styles, and that the most distinctive style is the Russian. 

The conclusion, of course, was not the important thing. The point of Hofstet- 
ter's project was to demonstrate that informal statements about music could be re- 
duced to formal propositions, in which form they could be subjected to rigorous 
testing. And it is this ideologically motivated idea of reduction that seems so foreign 
from the vantage point of the twenty-first century, when this kind of unreflective 
positivism is no longer widespread, even (one might almost say "particularly") in the 
hard sciences. The "nothing but" kind of objectivity embodied in Hofstetter's proj- 
ect is evident enough, but subtler forms of the same thinking can be more insidious: 
as Gerald Balzano (1987) has convincingly argued, a chronic problem in music psy- 
chology has been the tendency to understand perception as an internalization of ob- 
jective (e.g., acoustic) categories and structures. Postwar reductionism, then, has left 
a legacy that has not been entirely shaken off, but it is not the focus of "empirical 
musicology" as defined in this book: the culture of objectivity in the 1960s and 70s 
reflected an epistemological world view formed by the ideal of scientific progress, a 
stance that might be described not so much as "empirical" as "empiricist." The ori- 
entation of this book, by contrast, is intended as an essentially pragmatic one, in 
which reduction is seen — in Huron's (1999) words — as "a potentially useful strategy 
for discovery rather than a belief about how the world is." 

In some ways the positivist approaches of the postwar period were more a mat- 
ter of appearances than of substance; even at the time, Forte's pc set theory was crit- 
icized on the grounds that, for all its apparent objectivity, it was based on analytical 
decisions about how to divide the music up into segments for which there were no 
properly defined criteria. But there was a quite different problem that early examples 
of apparently objective analysis tended to present, which is what one was meant to 
do with them, what their value was. An appropriate example, since it achieved con- 
siderable exposure at the time, was Matt Hughes's (1977) "quantitative analysis" of 
Schubert's Moment Musical in C major, Op. 94 no. 1. This was certainly objective in 
the sense that a suitably programmed computer could have carried out the analysis 
without human intervention: it began with a straightforward note count, not in the 
serial sense but as a simple computation of how many Cs, Cfs, Ds, and so on there 
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were (that is, it is pitch classes that were counted, and the values were weighted by 
durations). This provided a measure of the tonal "orientation" of the piece (which 
turned out to be directed toward G rather than the tonic, C); the values for each 
pitch class were then mapped onto the cycle of fifths, and analysis of the resulting 
pattern of peaks yielded a measure of the piece's tonal "complexity." What is of con- 
cern here, however, is not the details of the method, but the context in which it was 
presented, and what came of it. The reason for its exposure is that it formed part of 
a "symposium" on Op. 94 no. 1 published in a widely disseminated collection ed- 
ited by Maury Yeston; the other elements of the symposium were a "compositional 
analysis" by Lawrence Moss and two Schenkerian analyses by Carl Schachter and 
John Rothgeb. The two Schenkerians engaged with one another, but otherwise — as 
frequently happens at symposia — the contributors talked (or rather wrote) past one 
another, without any form of mutual communication being opened up. And since 
then, Hughes's approach has disappeared more or less without trace. 

All this amounts to saying that there was little context within which to under- 
stand Hughes's analysis, and that context is essential to value. This can be made clear 
through a comparison with current work taking place on the border between music 
theory and cognitive psychology, which embraces the same kind of quantitative ap- 
proach illustrated by Hughes's analysis but sets it in a more developed context. Fred 
Lerdahl's Tonal Pitch Space (2001) builds on the foundation of his well-known work 
with Ray Jackendoff (A Generative Theory of Tonal Music, 1983), but fills in some of 
the gaps of the earlier theory by incorporating a model of tonal "pitch space" that 
goes back in its essentials through Schoenberg and Riemann to Ottingen — a two- 
dimensional matrix whose axes are minor thirds and perfect fifths. In essence Ler- 
dahl uses this pitch space as a means of evaluating the perceptual distance between 
different pitch classes, and this enables him to develop a range of quantitative mea- 
sures for tonal tension and the attraction between pitch classes and chords, includ- 
ing a model of key derivation. What is important here is not the undoubted techni- 
cal sophistication of Lerdahl's model, but the diversity of its linkages. In the first 
place, like A Generative Theory of Tonal Music, it is based in musical intuition (and 
Lerdahl is strikingly keen to ground it in his credentials as a composer rather than 
as a theorist, writing in the very first sentences of the book that, following the pub- 
lication of A Generative Theory of Tonal Music, "fresh theoretical ideas began to in- 
trude on my time for composing. The only way to unburden myself of them was to 
work them out and write them down" [2001: v]). In the second place, it not only 
draws on a variety of established theoretical traditions but is also illustrated through 
a variety of substantial analytical applications. And in the third place, it is expressed 
in more or less empirically testable terms, that is to say in terms of predictions of 
what listeners will perceive — and indeed the development of Lerdahl's thinking be- 
tween 1983 and 2001 in part reflects his collaboration with a number of experi- 
mental psychologists, in particular Carol Krumhansl. It is these connections which 
provide the context that was lacking in Hughes's work. 

Krumhansl has been a key figure in the interaction between music theory and 
cognitive psychology to which we referred; other major theorists with whom she has 
worked include Eugene Narmour, whose theoretical approach to melodic structure 
she has tested experimentally, concluding that "the uniformity with which the pres- 
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ent results supported the implication-realization model encourages the view that the 
model has successfully codified psychological principles governing melodic expec- 
tations" (Krumhansl 1985: 78). In fact she organized a year-long seminar at Stanford 
University during 1993-1994 in which both Lerdahl and Narmour took part, along 
with the music theorist Robert Gjerdingen, as well as the music psychologists Jam- 
shed Bharucha, Caroline Palmer, and of course Krumhansl herself; the outcomes were 
published in a special issue (1996) of Music Perception. All six researchers worked on 
the first movement of Mozart's Sonata in E L , K 282, but the result is in some ways 
reminiscent of the symposium on Schubert's Op. 94 no. 1 in which Hughes's article 
appeared. Of the music psychologists, both Krumhansl and Palmer engage fully with 
the music theorists' work, each of them testing predictions derived from Lerdahl's and 
Narmour's models (with generally rather mixed success); Bharucha at least references 
the work of all three theorists. Lerdahl's contribution, which anticipates central ele- 
ments of his 2001 book, includes a final section called "Connections" containing a 
single sentence on Gjerdingen, and a more extended discussion of Narmour: his aim 
here is to show how the basic insights of the implication-realization model can be 
accommodated within his own, more wide-ranging model. Narmour makes single- 
sentence references to each of Lerdahl and Gjerdingen. As for Gjerdingen, he con- 
spicuously omits any direct citation of either Lerdahl or Narmour (though he does 
include a pointed reference to "oversimplified assertions based on an imagined cal- 
culus of imagined tonal forces," Gjerdingen 1996: 370), instead contributing an 
exercise in historical musicology that has few if any points of contact with the other 
articles. A final contribution by Leonard Meyer (who was not present at the semi- 
nars) underlines the effect of fragmentation through being structured as a series of 
separate responses to each of the participants. 

The intention of these comments is not to question the significance of such 
work in its own terms, but to differentiate it from the "empirical musicology" pro- 
posed in this book. One obvious point about it is the division of labor: Lerdahl's, 
Narmour's, and Gjerdingen's work is intrinsically no more empirical than a great deal 
of music theory, while the work of the psychologists is not, and is not intended to 
be, musicological (thus Krumhansl specifically writes in the Preface of her Cognitive 
Foundations of Musical Pitch [1990: vii] that "the approach taken is that of cognitive 
psychology"). The model is rather one in which music theorists develop their ideas 
on a more or less intuitive basis, following which they are passed along to the psy- 
chologists and tested experimentally (a model replicated in the structure of the spe- 
cial issue of Music Perception, which presents the work of the three theorists followed 
by that of the three psychologists); in principle the expectation might be that the 
theorists would then revise their models in light of the experimental findings, 
though in practice examples of this are rather hard to find. But the more fundamen- 
tal lack of communication is between the theorists, and the reason for this lies in the 
nature of the theories. Huron (1999) draws a distinction "between those theories 
that claim to usurp all others, and those theories that can co-exist with other theo- 
ries"; in essence Lerdahl's and Narmour's theories fall into the former mould (which 
is why Lerdahl has to translate Narmour's concepts into his own theoretical language 
in order to accept them). Another way of saying more or less the same thing is that 
such totalizing, mutually incommensurable theories place the emphasis less on the 
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analysis as such than on the theory which the analysis serves to illustrate; there is a 
real sense in which Lerdahl's and Narmour's books are not so much about tonal pitch 
space and melody but about their respective theories of tonal pitch space and 
melody And this theoretical commitment in turn means that the reductions on 
which their theories are based give the appearance, at least, of embodying beliefs 
about how the world is rather than simply representing potentially useful strategies 
for discoveries (to borrow Huron's words again). Such work, then, has a fundamen- 
tally different orientation from the pragmatic, tool-oriented approach to "empirical 
musicology" presented in this book, whose aim is to document a number of domains 
in which empirical methods have had an impact on broadly musicological enter- 
prises, to provide some practical guidance in the application of those methods, and 
to illustrate some examples of the kinds of study that have made use of them, as well 
as considering some of their theoretical consequences. 

The book is organized as follows. Chapter 2 (Stock) provides an overview of 
empirical methods in ethnomusicology with a particular focus on participant-obser- 
vational fieldwork — a methodology that, though primarily associated with ethno- 
musicology, nonetheless has considerable potential for musicology more generally 
(i.e., it has as much application to Gilbert and Sullivan productions, and perhaps 
classical concerts at the Lincoln Center, as to ritual music in Taiwan). As Stock points 
out, the central principle of participant-observational fieldwork is the need for the 
researcher to become, as far as possible, an "insider" in the culture in question, to 
observe it and participate in it, and interpret it according to its own standards. Since 
music is very much more than just the production of certain sorts of sounds, but in- 
volves a huge variety of processes (social, financial, technological, organizational), 
participant observation has the potential to confront a researcher with an enormous 
mass, and wide variety, of data. The aim of the chapter is to provide practical advice 
on how to conduct and organize this kind of research — from preparation and plan- 
ning, through the use of a field log, the organization of field notes and audio or video 
recordings, and appropriate interviewing techniques, to the manner in which the 
ideas, attitudes, and "expressive styles" of informants are represented in published 
accounts of the research. There may be considerable barriers to becoming anything 
like an insider in some specialized musical subcultures (a participant-observer in 
the Leeds International Piano Competition, for instance, would need to develop for- 
midable skills as a pianist), but as Stock shows, using examples drawn from a wide 
range of musical traditions, there is a great deal that can be learned by studying any 
kind of music from within its own cultural practices. 

Chapter 3 (DeNora) is also concerned with ways in which music might be stud- 
ied and understood as socially embedded, but from the perspective of contemporary 
sociological theory and practice. Though the work of Theodor Adorno offers argu- 
ably the most ambitious and (still) influential theoretical tradition within the sociol- 
ogy of music, it certainly does not offer much encouragement for adopting an em- 
pirical approach to the subject. Adorno's approach, perhaps best understood against 
the backdrop of the appropriation of culture for purposes of propaganda in the 
Third Reich, was firmly rooted in a conceptually sophisticated analysis of music's 
ideological dimension; he saw social and ideological structures as replicated within 
music (as well as acted upon by music), and so encouraged "critical" readings that 
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turned away from the actual social circumstances surrounding the production and 
reception of music, and towards the close reading of musical texts. Like other soci- 
ologists of music today (e.g., Martin 2002), DeNora sees serious dangers in the ab- 
straction from social reality that such an approach entails. She identifies the continu- 
ing influence of Adorno in the writings of Lawrence Kramer and Susan McClary; such 
products of the so-called "New" musicology, she argues, maintain the traditional 
separation between musical works and the contexts of their production, perform- 
ance, and use, and as a result have no means to describe music as it functions within 
real social settings, and in specific times and places. By contrast DeNora provides a 
"toolkit" of different empirical approaches to the analysis of music as a social pro- 
cess, as it participates in people's everyday lives and sense of identity. She gives ex- 
amples of work that has examined the impact of social factors on composition (in- 
cluding that of commercial competition on innovation in pop music), social factors 
in the construction of musicians' reputations, the relationship between subcultural 
identity and musical taste, and music's role in the social construction of subjectivity. 
Lhese studies use empirical methods ranging from participant observation (as dis- 
cussed by Stock), through interviews and the analysis of historical documents, to the 
more impersonal methods of large-scale social statistics and economic surveys. 

If DeNora's chapter engages with people's socially constructed experiences of 
themselves and others through and around music, chapter 4 (Davidson) considers 
a range of empirical methods to investigate the social character of music, as seen 
through the lens of a more explicitly psychological approach. Starting from the ob- 
servation that the overwhelming majority of music-making is social in one way or 
another, Davidson looks at empirical methods for investigating music as social be- 
havior, ranging from controlled experiments and personality inventories, through 
video-recorded observation and covert manipulation of people's musical environ- 
ments, to the use of diaries and interviews as ways of tracking people's involvement 
in music. The cultural preoccupation with the musical skills of outstanding individ- 
uals has led to a significant body of research inquiring into the factors that might ex- 
plain or predict the appearance of such skills, and Davidson describes the use of 
large-scale quantitative methods in this field, as well as more focused and intimate 
enquiries focusing on a single family. This provides an introduction to some of the 
principles and methods of qualitative data analysis, which are also employed in 
analyses of the social processes involved in ensemble rehearsal and performance — 
a domain that has enormous potential but which has only recently been explored 
within a social psychological context. Finally, a number of authors have proposed 
that musicians' social interactions and behaviors are a function of their personality 
types, and Davidson provides an overview of some of the empirical methods by 
which people have attempted to measure personality attributes. 

Performance studies is the area of musicology which has arguably shown the 
greatest impact of empirical methods, and chapter 5 (Clarke) presents an overview 
of those methods and influences. As musicology has moved away from its overriding 
preoccupation with the score, and toward an understanding of music as perform- 
ance, it has adopted some of the methods — and even some of the explanatory prin- 
ciples — of the kind of research on performance that originated in psychology. Em- 
pirical studies of performance go back as far as the end of the nineteenth century, 
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but it is really since the development of cheaper and more powerful desktop com- 
puters in the late 1970s and 1980s, with their ability to handle large quantities of 
data and to record and analyze sound files on hard disk, that detailed empirical per- 
formance research has become a practical possibility for a large and growing popu- 
lation of researchers. The chapter gives an outline of this developing pattern of ac- 
tivity, with case studies illustrating the ways in which performance researchers have 
investigated keyboard performance directly from the instrument, a wider range of 
performances from sound recordings, and performers' gestures and body move- 
ments from video recordings. Practical advice about handling and interpreting MIDI 
(Musical Instrument Digital Interface) data is provided, as well as a discussion of the 
advantages and drawbacks of studying performance from the three most commonly 
used media (MIDI, sound, and video). Finally, the chapter considers the nature and 
explanatory function of artificial models of performance, arguing that they are best 
understood not as attempts to supplant or even mimic human expressive perform- 
ances, but as a way of establishing the general principles or "norms" against which 
genuinely expressive performance is projected. 

Chapters 6 and 7 (Cook and Pople, respectively) move away from music as a 
cultural, social, or behavioral event, and focus instead on the use of empirical meth- 
ods in relation to musical scores. Any study of a musical score is empirical in that it 
pays attention in some fashion to the "data" of the music, but traditional analysis 
does this only on a very small scale, in relation to single pieces and with very little 
attention given to the insights that might be gained from a more "data-rich" and 
comparative approach. Chapter 6 (Cook) is concerned with various ways in which 
systematic investigations of larger repertories of music can be undertaken, starting 
with matters of representation (if the aim is to search large databases of music, how 
do you represent the music in a way that is flexible and appropriate?), and then 
going on to the kinds of tool that have been developed in order to search for sys- 
tematic patterns in the data. Following a review of some earlier approaches to score 
representation and the operations that can be performed on the resulting databases, 
Cook focuses on David Huron's Humdrum toolkit — an approach to the representa- 
tion of scores, and associated search techniques, which aims to be as flexible and 
open-ended as possible, allowing users to create new tools to suit their own pur- 
poses. As examples of the ways in which Humdrum can be used, Cook describes 
studies in which Huron and his coworkers have evaluated, for example, analytical 
claims regarding motivic structure in Brahms, the relationship between style and ge- 
ographical location in folk song, and the extent to which trumpet music is idiomatic 
(i.e., is designed around the particular qualities and shortcomings of the instru- 
ment). Like any software, the practical usefulness of Humdrum depends on avail- 
ability and usability, and the chapter concludes with an assessment of the difficulties 
of turning cutting-edge research approaches into the everyday tools of musicologi- 
cal enquiry. 

Chapter 6 is concerned with the analysis of large bodies of musical data. By con- 
trast, chapter 7 (Pople) takes a closer look at what can be done when systematic 
methods are applied to individual works — in other words formalized analytical 
methods. These date back at least as far as the 1960s when, as we have seen, there 
was a widespread feeling (especially in America) that the analysis of music should 
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be made more scientific, but it is really only since the early 1980s — in particular 
with the publication in 1983 of Lerdahl andjackendoffs A Generative Theory of Tonal 
Music — that serious attempts have been made to formalize, and even to automate, 
aspects of the analytical process. This formalism is not in itself a development in em- 
pirical method, but it has important empirical consequences: once a method is for- 
malized, it becomes possible to apply it in a systematic and uniform manner and 
from this to discover what the empirical consequences of the approach really are. A 
range of approaches to which this applies is surveyed (from methods based on arti- 
ficial intelligence to neural nets), but it is particularly applicable to Pople's own 
"Tonalities" project, with which the chapter ends. Designed specifically with a view 
to the transition between tonality and atonality in the early twentieth century, but 
with distinctly wider applications, the "Tonalities" software represents an empirical 
means by which analysts can become increasingly aware of the consequences of their 
intuitions concerning a piece's structure. Through testing their intuitions in this way, 
analysts confront their unconscious preoccupations and blindspots: just as artificial 
performance models of the kind discussed in chapter 5 confront researchers with the 
consequences of a given theory, so Pople's approach uses the computer to flush out 
the sometimes unwelcome consequences that informal methods can all too easily 
skirt around. 

From the relatively clear representational categories of the score, chapter 8 
(McAdams, Depalle, and Clarke) turns to the messier empirical reality of musical 
sounds, and the ways in which those sounds can be represented and empirically in- 
vestigated. The first half of the chapter deals with a variety of ways in which sound, 
as a physical signal, can be represented in various kinds of computer-based visual- 
izations, each allowing different kinds of properties to be revealed and different 
questions to be asked. The chapter provides a detailed and systematic account of the 
nature of these visualizations and the acoustical principles on which they are based, 
together with practical advice about how such representations might be generated, 
and examples from the musicological literature of the use of such methods with a 
variety of musical styles. The second half of the chapter turns to the ways in which 
a perceptual (rather than physical) representation of sounds might be used to shed 
light on musicological questions. Perceptual principles can help to explain how and 
why sounds group together in both time and vertical texture, which in turn sheds 
light on a variety of issues involving orchestration, contrapuntal procedure, and 
rhythmic organization. This discussion of principles, again combined with practical 
advice about how data might be represented, leads to two final analytical applica- 
tions — one a perceptually motivated study of orchestration in Schoenberg's music, 
and the other a perceptual rationalization of traditional voice-leading rules. 

Many of the approaches discussed in the book involve the generation of data to 
which a number of general principles apply, revolving around the conditions under 
which they are collected and the kind of control that is needed, as well as the iden- 
tification of optimal forms of data representation and analysis. The purpose of the 
final chapter (Windsor) is therefore to explain the principles, and some of the spe- 
cific methods, of experimental design and statistical procedure in a range of musical 
contexts. Empirical data in broadly musicological research might include any of the 
following: 
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Score data 

Sound recordings 

MIDI data from keyboards (and possibly other instruments) 

Video data 

Diary data from performers, composers or listeners 

Interview data 

Audience questionnaires 

Data from a textual or visual analysis of CD covers or program materials 

Quantitative data relating to the sale of musical "merchandise" (e.g., 

recordings, fanzines, tee-shirts) 

The value or otherwise of such data lies in the larger research context within 
which they occur, and the musicological uses to which they are put. Reference to 
data on tee-shirt sales, for example, may look banal and of only bookkeeping sig- 
nificance — but if research into music and identity, for example, found that a pow- 
erful index of audience members' sense of identity with classical music was their 
willingness to buy the merchandise associated with it, then it might be important to 
know that a rise in tee-shirt sales in one specific year of a particular music festival 
wasn't just the result of lower prices, fancier designs, or the sudden influx of a new 
and more style-conscious sector in the audience. Apparently trivial information, in 
other words, may turn out to be musicologically valuable — but only if appropriately 
interpreted. The contribution that an empirical approach can make is not to be en- 
dorsed (or dismissed) simply because of its empiricism, but rather for what it can 
help to discover or reveal. And in order to discover or reveal anything at all, we need 
appropriate methods as well as good questions. That is what this book is about. 



Notes 

1 . "The grid through which we permit the figures of resemblance to enter our knowl- 
edge happens to coincide at this point (and at almost no other) with that which 
sixteenth-century learning had laid over things" (Foucault 1970: 22). For a more 
extended discussion of this quotation and the general issue see Cook 2002: 80. 

2. For a characterization of medieval musicology in precisely these terms see Leech- 
Wilkinson (2002), chapter 4; for the argument that data-poor fields breed interpre- 
tational conservatism see Huron 1999, on which we draw at many points in this 
chapter. 

3. Babbitt 1972b (but written in 1965), 11-12; a discussion may be found in Guck 
1994. 

4. Formally speaking, scientific experiments are designed to refute the "null hypothe- 
sis" that no effect is attributable to the factor(s) being tested (you can never prove the 
null hypothesis). 
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CHAPTER 2 



Documenting the Musical Event: 
Observation, Participation, Representation 

Jonathan P. J. Stock 



Empirical approaches have contributed to research in the field now named ethno- 
musicology since at least the nineteenth century. Comparative musicologists and 
folklorists in Europe, the Americas, and elsewhere each drew on the new technolo- 
gies of sound recording and mass publication in order to develop distinct forms of 
scholarly empiricism. More recent generations of folk music scholars and ethnomusi- 
cologists have measured their work against further empirical norms, and there re- 
mains today an intriguing relationship between the discipline's research methods, 
available technology, and the pattern of its truth claims. Ethnomusicologists have de- 
veloped empirical approaches to transcription, the analysis of musical sound, the dis- 
tribution of instruments and repertory, learning processes, and performance interac- 
tion, and further chapters could be written on each of these areas as well as on several 
others. Nonetheless, this chapter focuses on the topic of participant-observational 
fieldwork, because fieldwork not only is of central importance to enquiry in ethno- 
musicology but also can be a powerful research methodology for other musicolo- 
gists. This chapter looks at means of gathering empirical fieldwork data and associ- 
ated issues of interpretation, authority, and representation. Before looking in detail 
at fieldwork, however, I shall discuss broader aspects of empirical work in ethno- 
musicology by means of a brief historical summary. 



Introduction: The Empirical Urge in Ethnomusicology 
and Its Antecedents 

The invention of the phonograph in 1877 was almost a precondition for the disci- 
pline of comparative musicology as devised by European scholars in the final decades 
of the nineteenth century. Although some previous studies had used traveling musi- 
cians and sets of musical instruments purchased by colonial collectors, the new tech- 
nology of sound recording made two crucial contributions to the new discipline: 
first, it allowed researchers to assemble for comparative analysis extensive collec- 
tions of musical material from all around the world; second, repeated playback per- 
mitted the detailed study (and hence the transcription in modified staff notation) of 
non-Western musical sounds. Pitches were minutely measured and tonal systems 
postulated from these calculations. Meanwhile, instruments were categorized, not 

15 
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by reference to the contexts of their use within any one cultural system, but strictly 
in terms of their physical properties and performance technique (see, e.g., Ellis 
1885, Stumpf 1886, Abraham and von Hornbostel 1909, von Hornbostel and Sachs 
1914). Criticism of the musical styles in question was not undertaken; there was no 
attempt to explain or rank music in terms of its "greatness." Instead, the close analy- 
sis of recorded examples and musical artifacts was ultimately intended to reveal sci- 
entific principles underlying all human music making. 

Scholars of musical folklore were empirical in a different sense: not for them the 
careful weighing and measuring of recordings carried out by the comparative musi- 
cologists. Rather, many set out to gather and preserve what they saw as the essential 
music of their own nation. It was not enough to record such music in a haphazard 
manner, however, whether by means of notation or the phonograph. Instead, re- 
searchers conducted their enquiries in a systematic fashion, and they strove to locate 
those whom they deemed the most authentic representatives of tradition (Sharp 
1954: 119). Whole regions were surveyed (Bartok 1931, 1967); folk singers were 
questioned about their entire repertories; recorded and transcribed materials were 
deposited in archives — the very real museums of musical works; variant melodies 
and song texts were tracked down and classified by means of increasingly complex 
systems (Bronson 1949, 1959, C. Seeger 1966); and national song collections were 
published and disseminated through educational channels. 

Nevertheless, the appeals to empiricism articulated by comparative musicolo- 
gists and folklorists alike were open to challenge. The comparativists' detailing of 
cents, 100 to the semitone, and production of "weighted scales" (diagrams that illus- 
trate the relative durations allotted to each musical pitch within a particular extract), 
look admirably open to replication by any competent scholar armed with the same 
sound recording and analytical gadgetry. Yet even setting aside the real challenges of 
deducing a series of fixed, measurable pitches from the continuous sonic fluctua- 
tions of live performance (Schneider 1991), it is now clear that the quest for an 
underlying science of music often overlooked indigenous systems of musical theory 
and practice, resulting in comparisons between musical apples and oranges. 

Likewise, the folklorists' insistence on "authentic" forms, careful tracing of vari- 
ants (techniques borrowed from manuscript studies in the fields of musicology and 
literature), and the laying out of melodic and rhythmic types, gives the appearance 
of contributing substantively to the gradual accumulation of knowledge about the 
repertory as a whole. However, more recent research recognizes the longstanding 
changeability of most folk musical traditions. Today's authentic example has all too 
often proven to be the previous generation's radical innovation. Terms such as au- 
thenticity, once so readily affirmed in discussions of sampling and representative- 
ness, now have little credibility except in commentary on specific individuals' or 
groups' claims of ownership or identity. 

If the goalposts of earlier, pre-ethnomusicological empiricism were uprooted by 
later generations, however, it was only to shift them for replanting elsewhere. First 
so entitled in the early 1950s, the discipline named "ethno-musicology" (the hyphen 
was subsequently dropped) represented both a new beginning and a fresh attempt 
to address existing problems. Comparativist approaches that emphasized the scien- 
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tific study of physical sound were sustained under the new regime by researchers 
such as Charles Seeger who attempted to develop electronic transcription devices. 
These tools, among which the sonograph is perhaps the most widely used, were in- 
tended to facilitate the empirical study of non-Western musics through sidestepping 
the human subjectivity that might compromise aural transcription and analysis. 1 In 
a separate yet somewhat similarly inspired development, folklorist Alan Lomax 
(1968) devised a system of song analysis that tabulated some 37 technical traits of 
selected vocal examples (phrase length, melisma, tempo, nasality, and so on). These 
musical features were then compared to cultural aspects of the society in question. 
This system, cantometrics, allowed the detailed comparative study of songs from 
many societies without the prior translation of recorded sound into the potentially 
misleading conventions of Western staff notation (see Figure 2.1). Again, Mantle 
Hood (1971: 123-196), among others, proposed a new system for the classification 
of musical instruments: his graphic system of "organograms" was designed to ac- 
commodate multiple playing techniques, unlike the traditional hierarchical schemes 
based on performance technique (the high-level subdivision of lutes into bowed and 
plucked categories, for instance, made it hard to place an instrument like the violin 
that could be performed both ways). Ultimately, however, these attempts to fine- 
tune existing modes of empirical enquiry have proven tangential to the main thrust 
of the new discipline since the 1970s. 

It was the adoption of anthropological fieldwork as a primary research model 
that marked out ethnomusicology as distinct from earlier comparative approaches. 
In its classic sense, this meant at least one full year's residence within the culture 
being studied, speaking the local language and learning to perform local musical tra- 
ditions. Extended in situ research was intended to allow sufficient time for the eth- 
nomusicologist to become adept at understanding the contexts within which music 
making occurs, and to get to know individuals deeply enough for real discussion to 
take place. Formal interviewing, although sometimes the only way of speaking with 
particular individuals (and admittedly efficient for checking uncontested details), 
was seen as the least desirable form of research interaction, and the least likely to 
generate valuable insights. (Possibly only the mass questionnaire was regarded with 
greater distrust.) 2 Instead, the ethnomusicologist was to take part in and observe 
whatever music-related behaviors occurred customarily, becoming part of the exist- 
ing "community of speech" whose norms and values could be so easily displaced 
or closed off by the externally imposed constraints of interviewing. Learning to per- 
form was thus seen as a valuable research technique in that it facilitated access to 
other musical individuals and situations. It also provided personal experience of 
performance that could generate further research questions and insights. Mantle 
Hood (1960) coined the term "bi-musicality" to emphasize the researcher's duty to 
acquire some level of competence as a practical musician within the cultural setting 
in question. 

Nonetheless, and despite the fieldworker's efforts to develop his or her own mu- 
sical experience, the new ethnomusicologists agreed that interpretative weight was 
to be given to the musical explanations and evaluations offered by "insiders," that is 
to say members of the society in question: this constitutes "ethnomusicology" in the 
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Interpretation 

one singer & silent 

audience 

orchestra of 4+ who 

accompany subordinately 

heterogeneous group 

in which leadership 

shifts from one 

instr. to another 

monophony 

no blend (one singer) 

n/a (one singer) 

multiple parts 

instruments mainly of 

complimentary sonorities 

cohesive & coordinated 

repetition of syllables 

equals flow of new words 

simple metre, rubato 

intro . 

n/a (one singer) 

simple metre, rubato 

intro . 
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subordinate to melody 

arched phrases mainly 

through -composed 

quite long phrases 

8+ phrases before any 

repeat 

last note is lowest 

between 1 & 2 octaves 

steps predominate 

no polyphony 

a considerable amount 
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mid -volume 

some (intro.) & none 

some (intro.) & none 

some 

very melismatic 

present, heavy in places 

little or none 
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head voice passages 

narrow & tense 
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lacks rasp 

unemphatic handling of 

metrical accents 

normal enunciation 



V = Voice; ss Orchestra 

Criteria 15-37 (except 27) are concerned with the voice part only 

Figure 2.1. Cantometric profile of an excerpt from huju (traditional Shanghai opera) 
sung by Yang Feifei. 



original sense of the term (the musicology of the people), a sense that relates it to 
ethnopoetics and ethnohistory. Such insider accounts, for which the anthropologi- 
cal term is "folk evaluation," became the authoritative basis of a new empiricism, and 
the key responsibility of the fieldworker was to ensure that they were not misappro- 
priated in subsequent written analysis. There is, thus, a sense in which self/other dis- 
tinctions were both collapsed (the ethnomusicologist personally learns the music and 
the culture that goes with it) and sustained (the ethnomusicologist cedes evaluative 
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and critical authority to local voices). In sum, then, the new methodology could claim 
to be empirical in that it was based on sustained observation of and participation 
within the culture in question, interpreted according to indigenous standards. 3 



Investigating Music Through Participant-Observation and Fieldwork 

The study of music through fieldwork may be useful to students of Western musical 
culture in at least two respects. First, it is self-evident that music is more than sim- 
ply sets of sounds (and the notating of these sounds in symbols). Music is process as 
well as product, an arena for both social action and personal reflection; it is "emo- 
tion and value as well as structure and form" (A. Seeger 1987: xiv). Clearly, many of 
these essential parts of the whole complex that is music in any concrete setting are 
not immanent in the sonic material itself. Rather, they inhere, often only temporar- 
ily, among particular groups of socially and historically situated individuals. A study 
of these aspects of musical life will therefore need to integrate close examination of 
sound structures and symbols with analysis of the patterns of human action and 
thought that infuse these structures with meaning in specific social situations. Of 
course, the researcher able to discover what "music" is among a social group at any 
specific time is then well placed to assess questions of musical value and change 
within that society. Equally, the musicologist who analyzes what musicians and oth- 
ers actually do on particular musical occasions, and how these individuals explain 
what they do, is likely to gain enlightening perspectives on the sounds that 
emerge — there may be differences between theory and practice, for example, that 
repay close attention. 

Second, fieldwork-based approaches offer the socially orientated musicologist 
access to many kinds of music in Western society that have yet to be extensively writ- 
ten about from a historical perspective, and for which a full range of printed scores, 
published recordings, and other written documentation may not exist. Church bell 
ringing, musicals, and amateur popular music offer but three examples. In these 
cases, personal participation (possibly involving the researcher's performance as part 
of the musical ensemble) may be not just the best way of gaining access but the only 
way for the investigator to proceed. Elsewhere, there may be some written materials 
that can be rounded out through direct personal research. Professional popular 
music, for instance, is widely written about in journalism and also in several aca- 
demic disciplines, yet direct interaction with musicians, fans and others provides 
new perspectives on a range of issues not yet fully addressed in published writings. 

These points make it plain that the fieldworker is potentially faced with a huge 
amount of data: the whole field of musical sound (not only those elements recorded 
in notation), whole repertories of music-related behavior (from constructing and 
learning instruments to concert-going practice, modes of shopping for CDs, the use 
of hi-fi equipment, and habits of in-car listening as well as actual performance), and 
the social and mental arenas within which equally diverse individuals conceptualize 
and negotiate aspects of their musical experience. Information within these separate 
categories may conflict. The members of an ensemble might claim (and believe) that 
interpretative decisions are made democratically during rehearsal while observation 



20 EMPIRICAL MUS1COLOGY 

reveals that one individual normally provides most of the direction. Just as theoret- 
ical principles may be rarely realized in practice but are still valuable as ideals, so too 
there may be areas of practice that are not readily talked about. And as in the formal 
interview, the presence of the investigator may itself become a factor that affects the 
quality and nature of material collected. Gathering, organizing, and interpreting this 
heterogeneous body of data with any degree of confidence requires careful consid- 
eration of a number of problematic issues. 

Prefieldwork Preparation 

The first stage in any piece of field research is preparation. Depending on the site of 
the research, preparation may include language learning, preliminary instrumental 
or vocal lessons, library or archival research (familiarizing oneself with the musical 
traditions in question as well as with existing research), theoretical investigation of 
potential research questions (including looking at similar projects carried out else- 
where), acquisition of technical skills (learning to use, say, a digital audiotape [DAT] 
recorder or video camera), writing applications for funding, establishing initial con- 
tacts within the field, and assembling required materials (DAT cassettes, film, note- 
books and medicine, assuming the research is to be conducted in a location where 
these may not be readily available). A useful technique at this stage is to draw up a 
plan of research. The plan should sketch the main questions of the research, list 
known resources, summarize factors that will affect the conduct of the research, and 
note other conditions that will have to be met for the research to be satisfactorily 
conducted (see Table 2.1). 

Devising a plan makes it possible to cross-relate available resources and condi- 
tions in light of the main research questions. Potential problems in the plan might 
be exposed at once: the plan in Table 2.1 arguably involves so many genres that the 
researcher is unlikely to be able to acquire real expertise across the whole range. On 
the other hand, its embracing of instrumental music, vocal music, and dance to- 
gether with amateur and professional performance contexts may be valuable, and 
worth sustaining in some form. Perhaps research into two contrasting genres can 
occur throughout the year with others becoming topics of study for a briefer period 
at the end, when the researcher has built up more experience. A second criticism 
might be that although the field looks like being a relatively easy one to gain access 
to as a participant and observer, its very informality may make it difficult to pene- 
trate deeply. If musicians meet just once weekly and then spend most of that time 
playing, there may be relatively little opportunity for sustained musical discussion. 
The researcher may need to try to find another way of meeting some or all of the per- 
formers — individual instrumental or vocal lessons, if this is an appropriate model 
of learning within this community, might be one (though it raises further cost im- 
plications). Visiting people in their homes to review recordings or try out instru- 
ments might be another; this will require getting to know people well enough to set 
up such visits and to judge whether one is unreasonably imposing on them. 

In contexts where initial access is more difficult, the preparatory stage will in- 
clude a potentially quite extended phase of writing to or telephoning contacts and 
intermediaries. Before doing so, it is important to discover potential objections and 
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Table 2.1. Sample plan of research: vernacular musics in South Yorkshire. 



Main research questions 


Resources 


1. What role(s) does folk music play 


1, South Riding Chamber 


in contemporary England? 


Orchestra — every Wednesday 


2. Who is involved in these musics, 


at the Red House, Sheffield; 


and why — what's in it for them? 


uses notation 


3. What is the role of memorization, 


2. Clog dance group 


composition, and improvisation 


3. Hathersage carol singers — active 


in these musics? 


only pre-Christmas 




4. Listings of folk club sessions in 




SRFN News and local papers 


Significant factors 


Conditions to be met 


1 . A well established network of local 


1 . Easy to meet people collectively at 


enthusiasts and scholars; much 


practices and sessions, but will it 


advice readily available, much 


be possible to talk more deeply in 


material already collected and 


these situations? Will musicians 


existing contacts 


mind being recorded or filmed? 


2. An openness to newcomers both 


(Should I pay for these?) 


social and technical — no need to 


2. Will it be possible to participate in 


be of professional standard in order 


each of these genres? (Can out- 


to take part 


siders join the carol group?) 


3. One year available for main body 


3. Will it be necessary to memorize 


of field research 


large numbers of pieces and songs 




in order to progress beneath the 




surface? If so, is there time for this 




within one year? 




4. Transport costs. Other costs? 




(Beer money!) 



think through solutions to them, stressing ways in which the proposed research 
might advantage those under study (if nothing else, by virtue of an extra pair of 
hands on site at no cost — some reciprocation for the learning gained is often only 
fair, even if it is not always demanded). Education staff at a regional professional 
opera company in the North of England, for instance, were worried that a student 
who had proposed a period of research into the collective enterprise that constitutes 
opera production would need constant attention, thereby distracting them from 
their own work. Nonetheless, they were familiar with work placement schemes, and 
were supportive of the idea of an initial month-long period of research, a phase dur- 
ing which the student could attempt to gain trust and set up a longer residence. Sim- 
ilarly, school teachers may need to clear an incoming researcher's presence and ac- 
tivities with a head teacher and parents. Certain groups may simply (and perhaps 
reasonably) object to the idea of being "studied" at all (in such cases it is considered 
unethical to investigate them covertly), or it may become clear that, to be effective, 
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the research will demand linguistic skills or a commitment of time to on-site resi- 
dence that the researcher cannot meet. When problems of access or viability cannot 
be resolved — and courteous persistence often pays off — it may be necessary to 
abandon the initial research plan and look for another topic altogether. 

A final aspect of preparation concerns technical familiarization. The field worker 
may not have operated a video camera before, or may be proposing to record per- 
formances or discussions on an unfamiliar DAT, mini-disc, or cassette recorder. He 
or she may be unsure of microphone placement, selection of recording levels, or typ- 
ical battery life. In such cases, rehearsal and practice are indispensable. 

Field Log and Field Notes 

Once a topic has been selected, its viability examined, and funding secured, the field 
research itself can begin. At this stage, the fieldworker will need to tailor his or her 
approach to the musical field and context in question, a context that includes the 
research subjects' assumptions about the researcher. The researchers gender, age, 
class, ethnicity, and professional background may encourage the striking of certain 
attitudes by fieldwork contacts — perhaps they wish to shock, tease, or impress the 
researcher. While the formal interview or questionnaire is particularly prone to these 
kinds of distortion, other interactions can easily be affected as well. Two instances 
will illustrate this observation. A student who started to investigate views on music 
in gay clubs found that his attempts to initiate informal conversation about the 
music with individuals outside his group of friends were regularly assumed to be a 
subtle form of pickup line. Similarly, in educational contexts the incoming re- 
searcher may be assumed to be a "teacher," with the result that pupils try to give the 
"right" answer, or, more mischievously, impose their own agenda. A second student 
admitted that he had himself some years earlier been a research subject in an inves- 
tigation into talented young musicians at a specialist music school. Neither he nor 
his classmates had taken the research project seriously, he claimed, and they had 
vied with one another in faking data about the amount of practice time they put in. 
Fieldwork researchers seek to minimize these kinds of problem through their 
reliance on participation and observation as well as speaking with "informants" 
whom they come to know well. (Some writers prefer friendlier terms: consultant or 
field colleague.) The fieldworker attempts to participate in the life of the community 
in question for a sufficient length of time that he or she wins the trust and respect of 
those under study, and to discover through this process of familiarization how to ask 
questions or encourage conversation leading to genuinely meaningful information. 
Finally, the fieldworker hopes to become sufficiently aware of the values of the re- 
searched community that he or she can produce interpretations or analyses that are 
well founded. So, in the first example above, it may be that the researcher needs first 
to expand his circle of contacts among those who attend gay clubs; he needs to gain 
a clearer idea from experienced regulars about how to approach strangers without 
being misunderstood, and how best to steer conversation around to musical mat- 
ters — and this may enable him to reflect more directly on his own musical responses 
as a listener and dancer, particularly as these develop over time, and to make de- 
tailed observations of the responses of others. In the second example, participant- 
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observation might mean becoming a pupil at the music school in question for one 
academic year (more viable for a younger researcher than an elderly one, perhaps, 
though the mature student is no more "foreign" than the overseas ethnomusicologist 
in many another research situation); training (and hiring) one or more pupils as re- 
search assistants, thereby involving them personally in the project as researchers and 
not simply as data-bearers; or taking on a role as teacher in the school in order to 
gradually win the respect and fuller cooperation of pupils. 4 Whatever the specific so- 
lution, personal participation can often break down some of the "us and them" dif- 
ficulties that otherwise threaten the integrity of a project. Participation as some kind 
of peer (albeit an unusual one) typically leads to a level of personal experience that 
can directly inform research questions. Moreover, the researcher is well placed 
through participation to observe the actions of other individuals, and to set a picture 
of what people actually do alongside what they say they do. 

Whatever pattern of participant-observation is finally selected, a primary tech- 
nique of planning each day's research is by means of a research log. Helen Myers sug- 
gested that the fieldworker maintain a log in which a plan for each day's research is 
sketched on the left-hand page of each pair; the right-hand page is then used to sum- 
marize what actually happened (Myers 1992: 40). This is certainly a means of en- 
couraging good planning and a systematic approach. 

Commonly, the researcher keeps two further notebooks. The first is the kind 
that easily fits into a pocket or small bag; this is used for jotting down thoughts or 
observations as they occur during the day, for sketching diagrams (layout maps of 
ensembles, venues, rituals), and for noting information (names, addresses, lists of 
photos taken). If it is appropriate to take notes during discussions — and one can 
readily imagine many situations when pulling out notebook and pen would inhibit 
unselfconscious conversation — then this notebook may be used for that also. 

The second notebook is generally a more substantial one. While some research- 
ers carry it around to refer to points, this book is normally used for the writing up 
of "field notes" (and diagrams) at the end of each day's research. (Relatively few field- 
workers use a portable computer, largely because of the amount of other equipment 
they already need to take with them.) Field notes are, ideally, an unedited record of 
all that happened and was said on each day of research, rather like a diary in many 
respects. Compiling field notes takes discipline and patience, in that there may be a 
great deal to write down from even a relatively straightforward day's interaction and 
observation. Furthermore, music events often occur in the evening, meaning that 
writing up typically occurs late at night; though all ethnomusicologists probably do 
it sometimes, leaving writing up until the following day seems to place a strain on 
the frailty of the memory. (Our memories may, nonetheless, be better than we fear, 
and tape-recorded interviews or lessons can obviously be accurately transcribed 
days after they occurred.) Each day's entry should be as thorough as possible, and 
not simply confined to the material pertinent to present research questions, so that 
future research can make use of these data as well. Some fieldworkers compile an 
index for each volume of their field notes, and many leave space at the beginning of 
the book for a detailed table of contents listing date, location, and other key factors 
or informants by name. A sample extract from one page of my own field notes is il- 
lustrated in Table 2.2. 
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Table 2.2. Sample excerpt from field notes: ritual ensemble music in Taiwan. 

Thursday 12 August 1999, Evening Baifushequ, Jilong, Taiwan p. 33 

There are twelve to fifteen [Juleshe] members present when I arrive, rehearsing a 
qu [kio in Taiwanese: instrumental piece] 1 have not heard before. They break off 
as I come in and immediately ask me to sing Qi cun lian [the simple qu I was set to 
learn two days earlier] . I feel somewhat uncomfortable about disrupting the 
rehearsal, but it seems that they are happy to break off. 

My first attempt goes none too well: I sing the wrong mnemonic (though 
right pitch) in line two and mispitch the leap between lines four and five. The 
musicians don't criticize me. Wang Anao (one of the two teachers) simply asks me 
to repeat. Now everyone has stopped to listen. Wang tells me to read from the 
notation and to beat [da pai] each time there is a "o" symbol in the score. Again, 
I perform poorly, hesitating sometimes or singing in time but with incorrect 
mnemonics. Wang (and several of the others) emphasize that I need to practice 
more, learn how to keep a steady beat, and memorize better than this. 

Qiu then fetched a suona [traditional oboe-type instrument] from the wall 
rack and played through the qu twice for me, while I sang along. A second 
musician joins in the second time. They comment that I seem to know the tune 
all right but aren't yet strong enough on the mnemonics to be able to read other 
pieces in notation well. Deputy troupe leader Ni fetched a can of beer from the 
fridge for me, a gesture that caused some teasing from younger ensemble 
members. "Beer! He has beer. How come we don't have any? Let's get some too!" 
(In fact, they didn't help themselves to any drinks.) Clearly, I am still a guest. Qiu 
and Lin started to chew betel nut, and discussion moved on to the qulu [kiolo] 
repertory — music in which an ensemble including stringed instruments 
accompanies vocal music. Wang explained that this would involve the bringing in 
of outside musicians. None of them are practising kiolo right now. He encouraged 
another senior member to sing [the role] Wang Zhaojun but the man declined — 
he had no voice (meiyou shengyin). Wang mentioned that the [City?] Cultural 
Department [Wenjian hui / Wenhua Jianshe Weiyuanhui] organizes a class for this 
in the afternoons but they'll be on holiday at present. 

I ask what I might plausibly be able to learn in the remaining six weeks. 
There is much declaration of this being too little time — one man states that he 
himself has been studying for fifteen years and has learned almost nothing; they 
spend four months revising tunes in eachguan [kam, mode]. No specific 
instruction is forthcoming. 



In this extract (approximately one third of the entry for that evening session), the 
notes are an unsystematic mixture of present- and past-tense commentary, with oc- 
casional reminders thrown in. (A few explanatory comments have been added so that 
the passage makes sense to nonspecialist readers.) There is a small amount of direct 
quotation, though not very much, largely a result of the mix of languages that char- 
acterized the session: some musicians spoke Taiwanese, which I am just beginning to 
learn, others then translated this into Mandarin, with which I am more familiar. Di- 
rect quotation would be more common in cases where the fieldworker is properly fa- 
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miliar with the language, in that we are normally concerned with how individuals ex- 
press themselves, not simply with what they say. Names of individuals are provided, 
insofar as I knew them at that stage. There are data here on such issues as rehearsal 
procedures, modes of learning, and informality of interaction within the group, but 
this particular passage contains few analytical remarks or theoretical reflections. In 
fact, the writing up of notes is often a process that stimulates analytical thought. 
When it does, it is a good idea to note these thoughts down immediately (but label- 
ing them clearly as your own ideas and not views that one of the informants offered). 
In the literature there are a number of further suggestions concerning the pro- 
duction of field notes. 5 Anthony Seeger, for instance, suggests six questions that the 
field researcher might seek to document (1992: 90): 

1 . What is going on when people make music? What are the principles that or- 
ganize the combinations of sounds and their arrangement in time? 

2. Why does a particular individual or social group perform or listen to the 
sounds in the place and time and context that he/she/it does? 

3. What is the relation of music to other processes in societies or groups? 

4. What effects do musical performances have on the musicians, the audience, 
and the other groups involved? 

5. Where does musical creativity come from? What is the role of the individual 
in the tradition, and of the tradition in forming the role of the individual? 

6. What is the relation of music to other art forms? 

Answers to these questions as recorded in field notes (and subsequently abstracted 
in analysis) embody empirical information in four respects. First, the information 
preserved in carefully assembled notes forms a broad palette of personal experience 
and socially founded perspectives on which specific musical and social interpreta- 
tions can be based; through participant-observation the researcher not only begins 
to understand how it feels and what it means to engage in a particular form of music 
making, but also is well placed to survey the views of other, probably more expert 
insiders. 

Second, this form of in-depth investigation is one in which considerable effort is 
made to be sensitive to music as made by "real" people in "real" musical contexts. (The 
quotation marks acknowledge that the specialist denizens and contexts of the uni- 
versity seminar room, philosopher's den, or psychologist's lab are real too.) Context- 
sensitive field research accordingly exhibits scholarly empiricism in its attention to 
differences between musical processes and musical products, and between musical 
practices and musical theories — what John Kaemmer (1993: 14) calls practical con- 
sciousness and discursive consciousness, terms intended to emphasize that we may 
be well able to do things we do not habitually talk about (or vice versa). 

Third, the gathering of field notes is empirical in that it pays particular atten- 
tion to the separation of the researcher's views and interpretations from those of the 
group or society under examination. Value judgments and assumptions on the part 
of the scholar are normally avoided, with preference given to the (probably multiple 
and often contradictory) views of informants. We may personally deem The Gondo- 
liers musically superior to lolanthe, say, but the whole point of field research is to dis- 
cover what members of the Halesowen Light Opera Group and their supporters 
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think, not to impose our views upon them or to use them merely as stooges through 
whom to put forward our own opinions. When we discover views that are deeply 
held, we seek to value and respect these, perhaps to analyze or interrogate them, but 
not to undermine them in our subsequent written studies. 

Finally, the detailed recording of data in properly maintained field notes — who 
said or did what, where, when, and under which circumstances — provides a re- 
source that may be returned to regularly in order to develop new hypotheses about 
music and culture. (Manuals on fieldwork recommend that the researcher reread the 
entire collection of field notes at regular intervals as a means of stimulating further 
reflection, picking up on theoretical musings jotted down earlier on, and catching 
emerging patterns within the data.) Naturally, it is also possible to return to field 
notes in later years; the society may itself have already changed in the meantime, and 
subsequent use of the data will need to acknowledge this possibility (or be accom- 
panied by a restudy). 

Interviewing 

Some of the problems of interviewing have already been mentioned. Nonetheless, it 
is a fact of life that the interview is sometimes the only means of speaking with cer- 
tain individuals. Given that fieldwork is intended primarily to discover how a cer- 
tain group of people understand their own music making, it is important that the re- 
searcher avoids asking leading questions. In certain communities it is impolite to 
answer in the negative, which means that yes / no questions are of little use. Else- 
where, employment of long or detailed questions may result in short or unrevealing 
responses, because the interviewee is not allowed space to formulate answers in terms 
of his or her own categories of thought — categories that may, in the end, prove more 
important than the specific facts fitted into them. Furthermore, the interviewer's per- 
ceived role as a "music expert" may lead interviewees to see the experience as akin 
to a test, with the result that they feel uncomfortable or struggle to second-guess the 
"right" answer. David Reck, Mark Slobin, and Jeff Todd Titon (1996: 514-515) il- 
lustrate two sample interviews with the same "consultant" in their advice to new field- 
workers: the first fieldworker's questions, it can be seen, imply that particular answers 
are expected, and thus close off reflection on the part of the interviewee, whereas the 
second fieldworker keeps the discussion going by asking open questions. 

Fieldworker 1 : Did you get your first flute when you were a girl? 

Consultant: Yeah. 

Fieldworker 1 : What was the name of your teacher? 

Consultant: Ah, I studied with Janice Sullivan. 

Fieldworker 1 : When was that? 

Consultant: In college. 

Fieldworker 1 : I'll bet you hated the flute when you first started, I can 

remember hating my first piano lessons. 
Consultant: Yeah. 

Fieldworker 2: Can you remember when you got your first flute? 

Consultant: Yeah. 

Fieldworker 2: Could you tell me about it? 
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Consultant: Sure. My first flute — well, I don't know if this counts, but I fell 
in love with the flute when I was in grade school, and I remember going 
down to a music store and trying one out while my father looked on, but 
I couldn't make a sound, you know! 

Fieldworker 2: Sure. 

Consultant: So I was really disappointed, but then I remember learning to 
play the recorder in, I think it was third grade, and I really loved that but 
I didn't stick with it. Then in college I said to myself, I'm going to take 
music lessons and I'm going to learn the flute. 

Fieldworker 2: Tell me about that. 

Consultant: Well, I had this great teacher, Janice Sullivan, and first she taught 
me how to get a sound out of it. 

Film and Sound Recording 

There is less to say about film and sound recording than field notes, partly because 
most musicians are already experienced users of cameras and tape recorders, partly 
because quickly changing technology equally rapidly renders detailed technical ad- 
vice obsolete (other than the obvious "know your tools"), and finally because the 
principles that inform the use of film and tape in the field are similar to those that 
underlie the preparation of field notes. Nonetheless, two aspects of using these media 
will be briefly discussed here: the special contribution that they may make as re- 
search tools in themselves, and issues of preservation and documentation. 

Turning our attention first to film, it will be immediately apparent that research 
into music making in its context is likely to be stronger when the researcher pays at- 
tention to the visual dimension of the performance event. Multimedia genres such 
as opera and ballet offer obvious examples, but almost all musical performance 
makes meaningful use of location, space, and movement, whether in the form of the 
spatial deployment of organ and choir in a cathedral, the choreographic hand shifts 
of Chinese qin zither performance (Yung 1984), audience-performer interaction in 
Pakistani qawwali devotional singing (Qureshi 1995), the processing of competing 
brass bands through a Northern English town, or the sending outside in Taiwanese 
beiguan rehearsals of beginners who practice under the watchful tutelage of an ex- 
perienced musician (see Figure 2.2). Indeed, it is worth stressing again here the un- 
necessary restrictedness of the pervasive notion within much of the Western aca- 
demic community that music is specifically a sonic art. If, apparently like most 
people outside the academy, 6 we were to assume in our studies that the visual, mo- 
tional, and emotional aspects of music are just as central as its aural dimension, we 
would have taken a step toward constructing a musicology well placed to comment 
on music as social practice. 

Video and photographs provide an extra swathe of data, allowing the researcher 
to look repeatedly and in detail at aspects of the music event that might never be cap- 
tured in notation. Sometimes these extra data are clearly integral to the cultural prac- 
tice. Musicians in the Durham-based quintet Juke Box Jive, specialists in the histor- 
ically informed performance of 1950s- 1960s rock, recognized (and demonstrated 
during performance) the importance of the visual aspect: other than haircuts, 
clothes, and replica instruments (but replica only visually, as an up-to-date sound 
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Figure 2.2. A beginner beiguan player, assisted by Mr. Qiu, practices outside the main 
Rehearsal Hall (Baifushequ, Jilong, Taiwan; September 3, 1999; photograph by C. Chou). 



system was employed), their dance moves, energy of performance, and — as they 
stressed in discussion — youth were all part of the 1950s image. (In an irony the 
quintet understood very well, a genuine 1950s rocker would nowadays appear false, 
since he'd look too old.) 7 Moreover, study of these nonaural dimensions of per- 
formance can reveal the influence they exert on sound structures: for instance, Reg- 
ula Qureshi (1995: 148-174, 181-186) was able to map (on diagrams she named 
"videographs" and "videocharts") moments where direct audience encouragement 
led qawwali musicians to offer additional stanzas of certain songs, and where the per- 
ceived lack of audience interest led the ensemble to move quickly onto different 
strains. It is not just that interaction among those present shapes sound structure in 
certain circumstances, but that musical events can actually create, albeit temporar- 
ily, concrete models of social structure through which musically situated individuals 
move. Examples range from many forms of ritual through to secular dance-centered 
events: one does not have to be a social psychologist to find social structures in the 
forming of pairs who move (more-or-less) as one on such occasions. 

Unlike field notes, video and photographs can also be conveniently viewed by 
assembled members of the researched community, thereby encouraging further re- 
flection and conversation on the event by those who were involved, and opening up 
further informed interpretations to the researcher. (Sound recordings can often be 
profitably used in this way, too.) This is a particularly useful fieldwork device, in that 
it leads conversation around to what is significant in performance without the im- 
posed formality of the interview. Video and photography also offer a strong means 
of sharing ideas with subsequent audiences. In that it is normally the researcher who 
edits the film, selects the photographs, and composes a script to accompany these 
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images, however, issues of representation pertinent to all social research are partic- 
ularly marked with regard to the use of film and photography, and I shall return to 
this in the final section. 

Many researchers place the recordings they make (or copies of these) in archives, 
so that they can be used by others, including members of the researched commu- 
nity. Table 2.3 summarizes the information normally filed with any archived record- 
ing (this is in addition to information recorded in the field notes); while each archive 
may have its own standard formats for noting this material, it is probable that each 
will require information within the 10 categories laid out here. Much of this infor- 
mation will have been gathered by the fieldworker and entered into field notes, but 
a special effort will generally have to be made with regard to the question of copy- 
right and permissions. This can be a particularly thorny problem in contexts outside 
the jurisdiction of the researcher's legal system: for example, religious ritual partici- 
pants in communist China might well prefer no one to write their names down (as 
might street musicians in certain locations nearer home). A student attended an eth- 
nic drumming course in Britain where supplies of a mild (yet illegal) inhaled nar- 
cotic were shared around to encourage collective relaxation; while names can be 
changed in any published account that happened to mention this detail, the show- 
ing of the group on video (and their identification in the accompanying documen- 
tation) might break a tacit confidence. 



Coda: Representing Other Voices 

This last example has led directly to the issue of representation in social research. 
The ethnomusicologist and any other music researcher interested in speaking about 
"other" people faces a potentially complex web of ethical pressures and agendas. 
Clearly, we have a responsibility to those whom we represent in our writings (and 
films, lectures, and other offerings). This responsibility is not discharged simply by 
changing names and places in the published account (although this can help in cer- 
tain situations) or by specifically asking for permission for all that we do or write 
(which again can help). Not only are we building our careers on "their" expertise, 
but also, and particularly in the case of less well-known groups and societies, the 
scholars published writings may become the only sources available, and thereby 
shape other outsiders' expectations about the tradition or people in question. Kay 
Shelemay's studies of Ethiopian musical traditions suggested to her that the Falasha 
people (now more commonly known as the Beta Israel), many of whom were seek- 
ing evacuation to Israel, might not — as commonly believed — have descended from 
a formerly unknown Jewish tribe at all: given the potential political impact of this 
discovery, she had to consider how and where to publish her findings (Shelemay 
1991: 136-152). Our role as mediators between our informants and the academic 
world (and, through textbooks, school pupils) is also one that demands serious con- 
sideration. Kofi Agawu (1995) demonstrates how early studies of African music 
stressed difference to the degree that significant similarities between principles of Af- 
rican and European music making were overlooked. Effects of the construction of 
African music as "different" in kind included an emphasis on its exoticism (examples 



Table 2.3. Sample checklist for data to be preserved for each item recorded 
in the field. 



1 . Video or cassette number. These are typically allotted sequentially throughout the 
research trip as a whole, often with the year (and perhaps general location and 
medium) as a prefix: TAIWAN VID 99/01, for instance, refers to a video recorded 
in Taiwan in 1999. Apart from the obvious convenience of a numerical sequence, 
numbering tapes in this way allows the tapes to be marked up beforehand (on the 
tape itself as well as on the sleeve) such that errors are avoided when switching 
tapes over under pressure later on. This code number can be entered into the field 
notes when the tape is used. 

2. Exact date. It may avoid confusion among some potential users if the month is 
written in letters, for example "12 May 1992," rather than numerically "12/05/92." 

3. Exact location. For interview tapes at people's homes, recording their address 
enables the researcher to send a copy of the tape to those interviewed. (See also 10 
below.) 

4. One -line title for the event, such as "Beethoven Quartet Cycle at Clothworkers' 
Hall, Leeds." 

5. Technical data: digital or analogue, stereo or mono, PAL or NTSC, noise reduction, 
duration, etc.. 

6. Participants. This may involve a list of names (perhaps annotated with instruments, 
roles or other duties) or be simply a note of ensembles or communities involved: 
"Choristers of Durham Cathedral directed by James Lancelot" or "c.200 mourners, 
most apparently from Manchester's Iranian community." When possible, an indi- 
cation of the size of the group can be valuable. In some cases, a printed concert 
program can be filed with the recording. 

7. Instruments. Unusual or flexible Western ensembles should be summarized and as 
full details as possible given for all other instrumental groups, including indigenous 
names as appropriate: "Chou Chien-Er (voice and wood clapper pie), Diu Zai-Hing 
(four-stringed lute gibti), Pan Lun-Mui (vertical notched flute xiao), Gang Mo-Gen 
(three-stringed lute samhen) and Cua Tiam-Mo (two-stringed fiddle lihen)." 

8. Items, pieces, or other recording content by track (with real-time durations), 
including mention of language of speech or singing when other than English (or 
when use of English is itself significant, for instance in Italian opera performance or 
German punk rock). Cross-reference to page(s) of field notes where these items are 
discussed in more detail, or a copy of those pages. 

9. Recordist's name and contact details. Other users may need to contact you. 

10. Any ethical considerations. Researchers are sometimes permitted to record materi- 
als for their own use but not for public broadcast or publication. If this is the case, 
or if there is doubt about an aspect of the copyright of the performance recorded, 
reference should be made to special conditions ("for reference use only, no copies 
to be removed from the library without written authorisation"), to performers (or 
fieldworker's) contact addresses, or to a detailed copyright release. Where appro- 
priate, it is helpful to discuss all this with the performers prior to archiving their 
materials; the rise of digitization and automated remote access to archived materials 
has reinforced this need. 

1 1 . Acknowledgments to funding bodies or similar. 
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that sounded Western were seen as unrepresentative); in seeking to establish African 
music as a topic worthy of attention, scholars went to great lengths to provide in- 
terpretations of African musical activity (polyrhythm, "metronome sense," etc.) that 
ultimately reveal more about the assumptions of these scholars than they do about 
African music. 

Equally, we have a responsibility to our audiences: we are not simply the mouth- 
piece of our informants. Our training may give us perspectives on an issue unavail- 
able (and possibly uninteresting) to our informants. We may detect inconsistencies in 
positions taken by individuals or groups, uncover moments where traditions are con- 
structed by historical revisionism or invention, or wish to compare the views of rival 
groups; Georgina Born's (1995) analysis of the institutionalization of the Parisian 
musical avant-garde in the Institut de Recherche et de Coordination Acoustique/ 
Musique (IRCAM) offers one example of a study in which the author subjects the as- 
sumptions of the researched community to (highly revealing) critical examination. 

One means of writing that is widely employed is that of letting the researched 
speak for themselves, and thereby distinguishing clearly between internal views and 
the researcher's own reading of the situation. But the researcher, as author, still se- 
lects which voices get to be heard, how much they are allowed to say, and when they 
speak — so that the use of quotations does not eliminate the issue of representational 
ethics. (Indeed, some argue that the potential for scholarly misrepresentation be- 
comes all the stronger.) The onus remains on the researcher to find an honest and 
sensitive solution to the particular representational challenges exposed during the 
project. Effective use of informants' voices has been made by Paul Berliner (1994) in 
his study of jazz; his authorial presence is often limited to introducing each para- 
graph or summarizing and abstracting key observations from the many musicians he 
quotes on each new theme. Sara Cohen's (1994) analysis of indigenous views on 
what constitutes the "Liverpool sound" offers a second, if less extreme, example (see 
also Herbst 1997). 

There is a need, I would argue, for more research on Western musical traditions 
that seeks to understand how people outside the academy make sense of music. For 
such study to be effective, traditional academic assumptions about what constitutes 
music will need to be set aside, the researcher proceeding instead from a sensitive 
and careful analysis of what those personally involved value, do, and say in real mu- 
sical situations. Participant-observation offers one well-established means of engag- 
ing with these individuals and communities within their own territory and on their 
own terms. It also offers the student of music a demanding but absorbingly human 
means of research. 



Notes 



A sonograph (or, in America, melograph) is an electronic device that charts along the 
vertical axis on graph paper the frequencies, amplitude or, potentially, other features 
of a stream of sound against duration along the horizontal. The machine is able to 
detect pitches and details that are inaudible to the human ear, and does not interpret 
the sound into any preconceived tonal or rhythmic system. 
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2. Ethnomusicologists do nonetheless interview informants, and a smaller number use 
questionnaires and other techniques from quantitative sociological research. 

3. This empiricism, like that of previous approaches, is not without its own assump- 
tions, and these too have been interrogated in more recent writing. (There is a huge 
literature on this in anthropology, from which many ethnomusicologists draw; see, 
e.g., Clifford and Marcus 1986, Geertz 1988, van Maanen 1988, and Jackson 1989.) 
Most pointedly, perhaps, questions have been asked about the empirical nature of 
evidence about "other societies" the music researchers themselves help to create 
(through discussion, recording, and performance, among other means), and then re- 
formulate into a final written account that employs rhetorical devices and theoretical 
constructs with definite roots in the Western academic tradition. 

4. Henry Kingsbury used the first model of research at two American music schools; 
see Kingsbury 1988, and compare this with Bruno Nettl's more generalist account of 
these institutions (1995). 

5. Several ethnomusicological authors have reflected on the processes, rhetoric, and 
politics of writing in the field (see, e.g., Barz and Cooley 1997). Fieldwork is, of 
course, a key research mode in most of the social sciences, including folklore, geog- 
raphy, anthropology, and sociology, and writers in each of these disciplines have 
written about the preparation of field notes; useful texts from these fields include 
Briggs 1986, Jackson 1987, Sanjek 1990, and Hammersley and Atkinson 1995. 

6. Students of mine over the past six years have as a class exercise each spoken with 
five so-called "nonmusicians" to attempt to discover these individuals' definitions of 
music. With a total sample of nearly 1,000 people who have answered these ques- 
tions, it is perhaps significant that a higher proportion have used words like "iden- 
tity," "feeling," and "mood" than have referred to "sound" or related sonic terms. 

7. In proper ethnomusicological-cum-empirical fashion, this footnote gives the date 
and location of the specific meetings at which these observations were made: Juke 
Box Jive in performance (November 14, 1997, Sunderland) and personal communi- 
cation, November 21, 1997, Durham. 

8. A revised or invented tradition is, nonetheless, every bit as valuable, all else being 
equal, as one with an unchallenged historical record: comments about authenticity 
above can be reiterated here. The point is that a group may not wish their claims to 
authenticity or ownership to be critiqued at all. 



References 

Abraham, C, and E. M. von Hornbostel (1909). "Vorschlage fur die Transkription exo- 
tischer Melodien." Sammelbdnde der Internationalen Musikgesellschaft 11: 1-25. 

Agawu, K. (1995). "The invention of African rhythm. "Journal of the American Musicological 
Society 48: 380-395. 

Bartok, B. (1931). Hungarian Folk Music. London: Oxford University Press. 

Bartok, B. (1967). Romanian Folk Music, 3 vols., ed. B. Suchoff. The Hague: Martinus 
Nijhoff. 

Barz, G. E, and Cooley, T J. (eds., 1997). Shadows in the Field: New Perspectives for Field- 
work in Ethnomusicology . New York: Oxford University Press. 

Berliner, P. E (1994). Thinking in Jazz: The Infinite Art of Improvisation. Chicago: University 
of Chicago Press. 

Born, G. (1995). Rationalizing Culture: IRCAM, Boulez, and the Institutionalization of the Mu- 
sical Avant-Garde. Berkeley: University of California Press. 



Documenting the Musical Event 33 

Briggs, C. L. (1986). Learning How to Ask: A Sociolinguistic Appraisal of the Role of the Inter- 
view in Social Science Research. Cambridge: Cambridge University Press. 
Bronson, B. H. (1949). "Mechanical help in the study of folk song. "Journal of American 

Folklore 62: 81-90. 
Bronson, B. H. (1959). "Toward the comparative analysis of British-American folk tunes." 

Journal of American Folklore 72: 165-191. 
Clifford, J., and Marcus, G. E. (eds., 1986). Writing Culture: The Poetics and Politics of 

Ethnography . Berkeley: University of California Press. 
Cohen, S. (1994). "Identity, place and the 'Liverpool sound," in M. Stokes (ed.), Ethnicity, 

Identity and Music: The Musical Construction of Place. Oxford: Berg, 117-134. 
Ellis, A. J. (1885). "On the musical scales of various nations." journal of the Society of 

Arts 33/1,688 (March 27, 1885): 485-527. [Repr. in K. K. Shelemay (ed.), Garland 

Library of Readings in Ethnomusicology 7, 1-43. New York, 1990.] See also Journal of 

the Society of Arts 33/1,690 (April 10, 1885): 570 (correspondence from Ellis con- 
cerning the article); and "Appendix to Mr. Alexander J. Ellis's paper on 'The Musical 

Scales of Various Nations,' read 25th March, 1885." Journal of the Society of Arts 33/ 

1,719 (October 30, 1885): 1102-1111. 
Geertz, C. (1988). Works and Lives: The Anthropologist as Author. Cambridge, UK: Polity 

Press. 
Hammersley, M., and Atkinson, P. (1995). Ethnography: Principles in Practice, 2nd ed. 

London: Routledge. 
Herbst, E. (1997). Voices in Bali: Energies and Perceptions in Vocal Music and Dance 

Theatre. Hanover, N.H.: Wesleyan University Press and University Press of New 

England. 
Hood, M. (1960). "The challenge of 'bi-musicality'." Ethnomusicology 4: 55-59. 
Hood, M. (1971). The Ethnomusicologist. New York: McGraw-Hill, 
von Hornbostel, E. M., and Sachs, C. (1914). "Systematik der Musikinstrumente.'" 

Zeitschrift fur Ethnologie 46: 55-90. [Eng. trans, by A. Barnes and K. P. Wachsmann 

in Galpin Society Journal 14 (1961): 3-29.]. 
Jackson, B. (1987). Fieldwork. Urbana: University of Illinois Press. 
Jackson, M. (1989). Paths Toward a Clearing: Radical Empiricism and Ethnographic Inquiry. 

Bloomington: Indiana University Press. 
Kaemmer, J. E. (1993). Music in Human Life: Anthropological Perspectives on Music. Austin: 

University of Texas Press. 
Kingsbury, H. (1988). Music, Talent, and Performance: A Conservatory Cultural System. 

Philadelphia: Temple University Press. 
Lomax, A. (1968). Folk Song Style and Culture. Washington, D.C.: American Association 

for the Advancement of Science, 
van Maanen, J. (1988). Tales of the Field: On Writing Ethnography. Chicago: University of 

Chicago Press. 
Myers, H. (1992). "Fieldwork," in Helen Myers (ed.), Ethnomusicology: An Introduction. 

London: Macmillan, 21-49. 
Nettl, B. (1995). Heartland Excursions: Ethnomusicological Reflections on Schools of Music. 

Urbana: University of Illinois Press. 
Qureshi, R. B. (1995). Sufi Music of India and Pakistan: Sound, Context and Meaning in 

Qawwali. Chicago: University of Chicago Press. [Originally published by Cambridge 

University Press, 1986.]. 
Reck, D., Slobin, M., and Titon, J. T (1996). "Discovering and documenting a world of 

music," in J. T Titon (gen. ed.), Worlds of Music: An Introduction to the Music of the 

World's People, 3rd ed. New York: Schirmer, 495-519. 



34 EMPIRICAL MUS1C0L0GY 

Sanjek, R. (ed., 1990). Fieldnotes: The Makings of Anthropology . Ithaca, N.Y.: Cornell Uni- 
versity Press. 

Schneider, A. (1991). "Psychological theory and comparative musicology," in B. Nettl and 
P. V Bohlman (eds.), Comparative Musicology and Anthropology oj Music: Essays on the 
History of Ethnomusicology . Chicago: University of Chicago Press, 293-317. 

Seeger, A. (1987). Why Suya Sing: A Musical Anthropology oj an Amazonian People. Cam- 
bridge: Cambridge University Press. 

Seeger, A. (1992). "Ethnography of music," in H. Myers (ed.), Ethnomusicology: An Intro- 
duction. London: Macmillan, 88-109. 

Seeger, C. (1966). "Versions and variants of the tunes of 'Barbara Allen' in the Archive of 
American Folksong in the Library of Congress." Selected Reports in Ethnomusicology 
1/1: 120-167. 

Sharp, C. J. (1954). English Folk Song: Some Conclusions. London: Methuen. [First pub- 
lished 1907.] 

Shelemay, K. K. (1991). A Song of Longing: An Ethiopian Journey . Urbana: University of Illi- 
nois Press. 

Stumpf, C. (1886). "Lieder der Bellakula-Indianer." Vierteljahrschrijt fur Musikwissenschaft 
2:405-426. 

Yung, B. (1984). "Choreographic and kinesthetic elements in performance on the Chinese 
seven-string zither." Ethnomusicology 28: 505-517. 



CHAPTER 3 



Musical Practice and Social Structure: 
A Toolkit 

Tia DeNora 



The sociology of music has a strong empirical tradition, yet retains inspiration from 
its more philosophically oriented past. For sociologists, especially in recent years as 
the field has experienced a cultural and interpretative turn, the study of music has 
been linked to wider questions concerning social structure, stability and change, the 
interaction between social networks and musical production, the emotions, the body, 
the study of social movements, identity politics, and organizational ecology. In all 
these areas, sociologists of music have sought to ground their enquiries through the 
use of empirical methods designed for the scrutiny of behavioral trends, organiza- 
tions, and forms of action. In this chapter I take stock of the sociology of music's "tool- 
kit" and present some of the best-known empirical work within the field. My discus- 
sion is organized around two broad areas of study: musical production and musical 
consumption. To contextualize these topics, and to differentiate the empirical soci- 
ology of music from musicology's growing interest in social constructionism, I begin 
with a brief sketch of classic, and more overtly theoretical, work in music sociology. 



Sociology of Music: The Classic Legacy 

The most sociologically ambitious theoretical perspective to be developed during 
the last century is to be found in the work of T W Adorno (1903-1969). Adorno's 
perspective is distinguished by its comprehensive vision, and for the central place it 
accords to music within modern (and, as Adorno perceived, often repressive) cul- 
ture and social formation. 

In contrast to Max Weber's more formal concern with the origins of musical- 
technical practices specific to the West (Weber 1958), Adorno focused on the ques- 
tion of music's ideological dimension. In line with classical philosophers such as Plato 
and Aristotle, he pursued the question of music's ability not only to reflect but also 
to instigate or reinforce forms of consciousness and social structures. For Adorno, 
different forms of music were homologous with (structurally parallel to, and thus 
able to inculcate) cognitive habits, modes of consciousness, and historical develop- 
ments. As he saw it, music's compositional processes — its degree of conventionality, 
the interrelation of musical parts or voices, the arrangement of consonance and dis- 
sonance — could serve as means of socialization. This ultimately structuralist notion 
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is perhaps best exemplified by considering Adorno's views on the contradictory pos- 
sibilities for consciousness posed by twentieth-century musical forms. On the one 
hand, he believed that Schoenberg's music could enable critical consciousness be- 
cause, through its processes of composition — for example, its use of dissonance and 
formal fragmentation — it modeled a mode of critical attention to the world that re- 
fused to offer "false" musical comfort. On the other hand, jazz, Tin Pan Alley, and 
other popular genres inculcated psychological regression and infantile dependency 
(Adorno 1990; Witkin 1998), providing, in the age of "Total Administration," a 
medium that "trains the unconscious for conditioned reflexes"(Adorno 1976:53). 
"Wrong" music thus had to be denounced, and for this reason, Adorno considered 
socio-musical study to possess a special urgency: given music's capacity to "aid en- 
lightenment" (Adorno 1973:15), socio-musical analysis was nothing less than a tool 
for liberation. 

These are certainly profound questions and ones to which musicologists are in- 
creasingly drawn. During the 1970s, interest in Adorno's work was located periph- 
erally within musicology (e.g., Subotnik 1976, 1978, 1983; see also Subotnik 1990 
and McClary 1991:175n). During the 1980s and 1990s, by contrast, musicologists 
increasingly turned to Adorno. While they generally rejected his dismissal of popu- 
lar culture and his notion that truly great, libera tory music was that which had "es- 
caped from its social tutelage and is aesthetically fully autonomous" (Adorno 1976: 
209), they took up Adorno's concern for music's social and ethical character. In par- 
ticular, they sought to "ground" musical works, and the values embodied in them, 
either through showing how musical representations inscribed social relations (Su- 
botnik 1983; Leppert and McClary 1987), or through relating them to a cultural his- 
tory or psychology of music consumption (Cook 1990, Frith 1996, Johnson 1995). 
Consideration of how musicologists responded to Adorno, and more broadly of the 
way in which they adopted a social-critical perspective, helps to illuminate some of 
the differences between musicology 's and sociology's "toolkits." It also provides a 
springboard into contemporary sociology's more "action-oriented" focus on music as 
social practice, its shift away from a homology-centered, structuralist paradigm, and 
its quite different take on the "social construction" of music. 

Ten years on from Goehr (1992) and Randel (1992), a form of social construc- 
tionism thrives in musicology, one that opposes itself to traditional understandings 
of what is "natural" in music. Even basic, previously taken-for-granted concepts 
such as the musical "work" have been deconstructed, shown to be purely social con- 
structions of restricted historical and geographical application. Today, most musi- 
cologists would probably agree with Handel's apt observation that musicology's tra- 
ditional "toolbox" was designed for the construction and maintenance of a canon of 
acceptable topics, namely, works and composers. But, as I shall suggest in this chap- 
ter, the forms of constructionism now prevalent within musicology are, from a cur- 
rent sociological perspective, not so different from the structuralism characteristic of 
Adorno's work. Although there are some notable exceptions, particularly studies of 
musical listening, reception and use, constructionist approaches in musicology still 
center on works, and on critical readings of them that aim to reveal the music's so- 
cial content. 

In the writings of Lawrence Kramer and Susan McClary, for example, we are di- 
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rected to see music as structurally similar (homologically linked) to social phenom- 
ena, or as a "representation" of some extramusical phenomenon. The methodologi- 
cal toolkit here — uncovering intertextual allusion, identifying conventional tropes 
and the ideological connotations and functions of these tropes, comparing (some as- 
pect of) music's structure with (some aspect of) the structure of something else — 
maintains a separation between works and the actual contexts of their production 
and reception. While social contexts and contents are the ultimate quarry of this 
type of "New" musicology (as the work of such writers as Kramer and McClary was 
termed in the 1990s), they are typically pursued through the analysis of texts, rather 
than through more ecological, empirically oriented investigations of the production, 
distribution, and consumption of music. Such a move also sidesteps the contested 
meanings that arise within particular contexts, for example, through resistance to 
particular musicological interpretations. In short, it is impossible to specify music's 
mechanism of operation: there is no methodology for describing music as it acts 
within actual social settings, specific spaces, and in real time. 

I do not here wish to imply that sociology cannot benefit from or be compatible 
with this type of text-based musicological constructionism; on the contrary, a weak- 
ness of sociology has been its failure to deal with music's specifically musical mate- 
rials, and here textual interpretation and analysis can help to draw sociological stud- 
ies on to more firmly musical terrain. Nor do I wish to imply any clear division of 
labor between musicology and sociology; some of the best "sociological" work on 
musical topics is currently being done by musicologically trained scholars (e.g., 
Pasler forthcoming). Rather, I wish to contrast the textual focus of "New" musicology 
with the emphasis of the sociology of music, particularly since the late 1970s, on an 
action-based paradigm — one that is concerned with the matrices and milieus in 
which action is framed and effected. Howard Becker (1989: 282) put his finger on the 
difference when he wrote, with disarming clarity, that sociologists of his persuasion 
(generally termed "social interactionists") "aren't much interested in 'decoding' art- 
works [but rather] prefer to see these works as the result of what a lot of people have 
done jointly." This version of constructionism treats music as a social process, focus- 
ing on how musical structures, interpretations, and evaluations are created, revised, 
and undercut with reference to the social relations and contexts of this activity. It is 
also concerned with how music provides constraining and enabling resources for so- 
cial agents — for the people who perform, listen, compose, or otherwise engage with 
it. As the sociologist Pete Martin (1995: 42) has observed, "in general this 'turn to 
the social' in musicological studies has not led to a sustained engagement with the 
themes and traditions represented within the established discourse of sociology" — 
themes and traditions that are at some remove from Adorno and his structuralist 
perspective. Martin calls instead for a focus on music as it is lived and experienced, 
quoting the Swedish musicologist/ethnomusicologist, Olle Edstrom, on how the 
members of his group at Gothenburg responded to Adorno: "we gradually gained a 
deeper insight into the pointlessness of instituting theoretical discourses on music 
without a solid ethnomusicological knowledge of the everyday usage, function and 
meaning of music" (Edstrom 1997: 19, quoted in Martin 2000: 42). 

Edstrom and Martin both allude here to a shift in focus from abstract theory and 
"macro" issues (such as systems, societal structures, and norms) to grounded theory 
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and "micro" concerns (such as a focus on individual and collective practice). Part of 
this shift centers upon the concept of social agency, on how both social and musical 
forms (including meanings) are put together or accomplished jointly, in Becker's 
sense. This focus on activity is, as I shall argue intermittently throughout this chap- 
ter, a very useful perspective. It is dedicated to elucidating the links between social 
and musical structures in ways that are more than hypothetical. It conceives of the 
music-society nexus in terms of the pragmatic contexts within which musical works 
take shape and come to have "effects" in real situations. This focus on action pro- 
vides an alternative to homological models and their text-centered methodological 
toolkit — to the emphasis, pace Adorno, Attali (1985), and Shepherd (1991), on how 
music "reflects," "anticipates," or is structurally analogous to social developments 
or social structures. From a social-interactionist perspective, then, neither Adorno- 
inspired sociology of music nor musicology's version of constructionism is sufficient 
for illuminating ("grounding") music's sociality. The problem with both these in- 
herently structuralist, text-centered modes of study is simply this: they are oriented 
to the recognition of patterns and structural affinities between two or more realms 
(music and some aspect of society — ideology, gender or class relations, identities, 
cognitive styles), but they are not able to document the mechanisms that create these 
patterns, that is, to describe how music informs or enters social life, and vice versa. 
They assert links between music and society, but their methodological toolkits do 
not equip them to show these links in terms of how they are established and how 
they function within actual musical and social contexts. 

By contrast, newer sociological perspectives concerned with social agency in- 
vestigate the social processes through which these links are forged. As the French so- 
ciologist Antoine Hennion says, "it must be strictly forbidden to create links when 
this is not done by an identifiable intermediary" (1995:248). By this, Hennion 
means that while music may be, or may seem to be, interlinked to "social" matters, 
for example, patterns of cognition, styles of action, ideologies, institutional arrange- 
ments, such links should not be assumed. Rather, they need to be specified (ob- 
served and described) at their levels of operation, for instance in terms of how they 
are established and come to act. We need, in short, to follow actors in and across situ- 
ations as they draw music into (and draw on music as) social practice. And this is 
where empirical methods come into their own within the sociology of music. There 
are good parallels and precedents to be found in the social study of another "tech- 
nical" realm: science and technology, in particular in the study of science-in-the- 
making (Knorr-Cetina 1981; Latour and Woolgar 1986; Bijker, Hughes, and Pinch 
1987; Latour 1987). It should be underlined here that these studies of scientific 
practice and knowledge formation, most of which have been conducted by sociolo- 
gists with advanced training in the sciences, have concentrated on action — on the 
situated production of scientific matters of fact, step by (sometimes contested) step. 
In this respect, such action-based studies move well beyond more general concerns 
with the parallels between science and society. And some recent studies of this sort 
have begun to focus explicitly on music technology and musical culture (e.g., Pinch 
and Trocco 2002). 

It is, then, in the focus on culture-producing worlds that the sociology of music 
has found its empirical feet, and thus a way to ground its claims about the links be- 
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tween music and society. More specifically, as I will describe below, such work cen- 
ters on action: on musical practices in and across musical and extramusical realms. 
For example, it is concerned with musically engaged actors as they constitute (and 
negotiate the constitution of) music through performance, through coordination, 
and through reception. It is also concerned with how these constitutive processes in 
turn draw upon music to constitute other social realities, realities that may exceed 
the musical but that may, simultaneously, be articulated with reference to music. 
And with this focus it is possible to dispense with the music-society dichotomy, and 
to think instead of musical practice as, inevitably, social practice. 



Sociology of Music: Musical Production and Its Milieux 

During the 1970s and 80s, and particularly in the U.S. and U.K. , new paradigms were 
developed that sought to explore music's links to social processes and contexts rather 
than structures. Here, music was conceptualized, simply, as social activity. Known as 
the "production of culture" approach, and developed by scholars such as Peterson 
(1976), Wolff (1981), Becker (1982) and Zolberg (1990), this perspective provided 
an effective antidote to the overly theoretical character of Adorno-influenced mod- 
els. It reinvigorated the sociology of music in its emphasis on action and action's ma- 
trices. It reconceptualized the composer, or music producer, as a member of a mu- 
sical world or community, and as working with and abiding by (or reacting against) 
conventions and work practices in order to make music. This view was deliberately 
prosaic; the production of culture approach sought to demystify the romantic notion 
of "the composer" and its attendant ideology of the genius in the garret. 

Karen Cerulo's (1984) study of change in musical composition across six coun- 
tries during the Second World War serves to illustrate these points. Cerulo focused on 
the social disturbance brought about by war and its relation to music-compositional 
practice. She examined the prewar and wartime activities of composers whom she 
divided into two groups, those located in combat zones and those who operated in 
more stable environments. She began with the hypothesis that the work of com- 
posers located in areas most characterized by social upheaval due to war would ex- 
hibit most evidence of stylistic change, with composers based in non-combat zones 
showing less evidence of change in their compositional styles and practice. She es- 
tablished a sample of wartime works, focusing on pieces that were intended by their 
composers explicitly as reactions to the war, and compared these with prewar works 
by the same composers so as to identify any changes in style during the war years. 
Government-sponsored works were excluded, on the grounds that they may have 
needed to portray official sentiments (through uplifting march rhythms and so on). 

Thus delimited, Cerulo's sample consisted of 16 works by 14 composers over 
six countries — combat zone (wartime England, France, Hungary, Germany, and 
Russia) and noncombat zone (prewar England and the U.S.). These works were ex- 
amined in terms of the following features, conceived of as dependent variables (see 
this volume, p. 219, for a definition of dependent variables): melodic structure, 
tonality, dynamics, rhythm, medium of expression and form. ("For purposes of ped- 
agogical vividness and ease of exposition," however, Cerulo's discussion of her find- 
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ings focused primarily on melody.) In particular, Cerulo sought to measure the de- 
gree to which melodies were conjunct ("smooth gradations") or disjunct ("leaping 
motion") before and after the onset of war in each zone. She plotted melodic pitches 
using crotchets — one for each new pitch — so as to achieve a graph of melodic spac- 
ing for each work. She concluded that while before the onset of war the works of all 
composers in the sample — combat zone and non-combat zone composers, exhib- 
ited jagged melodies, after the beginning of the war those in combat zones became 
conjunct and lengthy, while those in noncombat zones remained unchanged (1984: 
892). From this, Cerulo concluded that she had found evidence for the impact of 
disruption on compositional practice. She then turned to the critical question: how 
was one to explain this apparent shift in compositional practice? 

While older sociological paradigms might have pointed to a homology (or re- 
verse homology) between disruption in society and conjunction in music, with per- 
haps an associated psychological explanation of trauma and its impact on com- 
posers' needs for consonance and congruence of musical material, Cerulo took a 
different and more pragmatic tack. She emphasized instead how war-zone com- 
posers were cut off from normal music-world interactions, from information and 
communication with fellow composers, and from access to music publications: "The 
loss of contact with peers experienced by Combat Zone composers destroyed their 
professional community." This, in turn, Cerulo suggested, "led to the unraveling of 
the normative prescriptions that govern techniques of composition. Consequently, 
in the absence of both a supportive system and its enforcement by contemporaries 
of normative adherence, composers deviate from their current paradigm of musical 
construction" (184: 900). 

To be sure, these conclusions may provide a source for fruitful debate by music 
historians: why, for example, if changes in stylistic practice were a function of loss of 
normal networks and communication patterns, should the deviation of isolated war- 
zone composers all exhibit the same basic tendency — the shift from disjunct to con- 
junct melodic lines? How might the study benefit from more detailed consideration 
of the individual work-lives of composers? Does the graphical method of plotting 
melodic movement provide a valid means for comparing different melodic struc- 
tures? Could identification and measurement of the parameters of compositional 
material be combined with an ethnographic understanding of the meanings (local, 
regional, biographical) associated with musical materials and practices? I suggest 
that the value of Cerulo's work (and the justification for reading it today) lies in her 
general interrogative strategy, her bold attempt to specify measurement techniques 
for the study of compositional practice and, in particular, her focus on production 
networks and communication as a determinant of this practice. 

Cerulo's study is important in the present context not only because it was one 
of the first sociological works to deal with musical forms and stylistic change, but 
also because it can be regarded as a pivot between the older homological model and 
the newer approach, with its emphasis on music-producing worlds and on the so- 
cial contexts of artistic production. As Cerulo (1984: 885) put it: "the limited body 
of literature dealing with the transection of artistic creation and social structure con- 
sists almost entirely of large-scale, speculative theories which are heavily influenced 
by sociohistorical arguments, and whose illustrative support often rests on the sty- 



Musical Practice and Social Structure 41 

listic and structural changes in the music of a single composer, or a particular musi- 
cal tradition." While seeking to distance herself from "speculative theory," Cerulo 
also set her sights on matters that connected back to the grand tradition within 
music sociology — concerns that were addressed by the earlier homological perspec- 
tives she sought to transcend. On the one hand, her work can be read as in contrast 
to structuralist approaches, such as Lomax's (1968) "cantometric" investigation of 
correspondences between song styles and societal structures. (For Lomax, song styles 
reflect societal forms and, thus, thus habits of mind congruent with these forms — 
see this volume, p. 17, for further details.) On the other hand, Cerulo wished to re- 
tain Lomax's concern with musical style and its variation across social space — too 
often, she argued, ignored by the new perspectives and their focus on production, 
markets and patronage — while linking that concern with a focus on the production 
circumstances of composers. In this sense Cerulo's study represented a pioneering 
attempt to illuminate the "transection," as she put it, of structure and creation: that 
is, to devise means of measuring the impact of a changed social context on creative 
activity in music. 

By 1989, the "production" perspective was firmly established in not only the 
anglophone but also the francophone world, after Pierre Bourdieu's (1984) work on 
taste publics and social classification systems, and Bruno Latour's studies of science 
worlds and science in the making (1987). These perspectives and the various pub- 
lications that issued from them drew upon detailed empirical study — ethnogra- 
phies, cultural and social histories, quantitative surveys, and studies of institutions. 
It was precisely what Becker referred to as "what a lot of people have done jointly" 
that formed the focus of sociological investigation between, roughly, 1978 and the 
middle 1990s. In retrospect, the contributions of these years may be set in one of 
three broad categories: (1) conditions of production (2) the construction of musical 
value and reputation, and (3) musical tastes, consumption, and social identity. 



Conditions of Musical Production 

Cerulo's work is representative of a large number of studies aiming to show how the 
content of musical works is shaped in relation to musicians' working conditions. 
Elias's (1993) pioneering consideration of Mozart, for example, suggests that Mozart's 
compositional scope was hampered by his location between two patronage modes 
and his inability to escape the shackles of aristocratic control. Similarly, Becker's 
(1963) study of dance musicians documents how career patterns and occupational 
opportunities are shaped by patrons and by the need to find a fit between musicians' 
aspirations and tastes and what their publics will tolerate. Not only are individual 
compositional practices affected by productional organization, but so too is the se- 
lection of compositions that are ultimately produced and marketed. Peterson and 
Berger (1990 [1975]) illustrated this point in a highly influential study that revealed 
how musical innovation was enabled and constrained by infrastructural features of 
the pop music industry; their work suggested that innovation in pop arises from 
competition between large record companies and their smaller rivals, showing that 
diversity in musical forms (as they are produced and reach their publics) is inversely 
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related to market concentration. At the time their article was published, Peterson 
and Berger were trailblazers for the "production of culture" perspective, and their 
study still serves as a model of how to conduct work in this tradition. 

Peterson and Berger examined number one hit songs over 26 years of record 
production, from 1948 to 1973, dividing this period into five eras of greater and 
lesser degrees of market concentration. Eras of high market concentration were 
those in which a high proportion of the annual production of hits was produced by 
one of the four leading companies: during such eras, these companies controlled 
over 75 percent of the total record market (in fact just eight companies produced 
nearly all the hit singles). Peterson and Berger considered whether such concentra- 
tion bred homogeneity of product, pursuing this question by examining the sheer 
number of records and performers who recorded the hits during their five eras; the 
thinking was that there might be little incentive to introduce "new" products under 
conditions of market concentration. They also examined the lyrical content of hits, 
tracing these variables through the five eras as competition between record compa- 
nies grew and then diminished over the 26-year period. Simultaneously, they con- 
sidered indicators of what they termed "unsated demand," such as changes in record 
sales and the proliferation of music disseminated through live performance and 
backed up by independent record producers — genres such as jazz, rhythm and 
blues, country and western, gospel, trade union songs, and the urban folk revival. 
They then considered the conditions under which the independent producers were 
able to establish more secure market positions as the top four producers lost control 
of merchandising their products over the radio. Finally they traced how the record 
industry and its degree of market concentration expanded and contracted cyclically 
over time. 

By studying conditions of record production and marketing, relating these con- 
ditions to new developments in the communications industry, and examining trends 
in record output and product diversity, Peterson and Berger concluded that changes 
in concentration lead rather than follow changes in diversity, and that this finding 
"contradicts the conventional idea that in a market consumers necessarily get what 
they want" (p. 156). Their study not only highlighted the impact of production- 
organization on musical trends and styles; it also outlined how popular music pro- 
duction is characterized by cycles, and detailed some of the mechanisms that affect 
cyclic development. 

Peterson and Berger's study set the scene from the 1970s onward for the con- 
cern, in popular music studies, with the production system. Negus (1992), for ex- 
ample, has suggested that working practices within the popular music industry are 
linked to an artistic ideology associated with college-educated white males who 
came of age in the "rock generation" of the 1960s and 70s. This occupational strat- 
ification has consequences for the types of pop that are produced: women and un- 
familiar styles and artists, for example, are marginalized (Steward and Garratt 1984). 
Such forms of gender segregation may also be seen in pedagogical settings (Green 
1997), particularly with regard to instrument choice — a topic that overlaps with 
work by social psychologists (O'Neill 1997). 

In the "production" studies discussed so far, the primary methodological strat- 
egy consists of a focus on organizational contexts of musical production, and an at- 
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tempt to conceptualize musical work as not so different from other types of work, 
insofar as it requires collaboration, resources in the form of materials, conventions, 
and communication. Through this strategy, music's link to social structure is speci- 
fied: musical structures are examined in terms of their links to the local contexts or 
musical worlds in which they are produced, distributed, and consumed. The pro- 
duction perspective thus illuminated the impact of social structure on music in 
highly concrete ways; it highlighted the mundane circumstances under which mu- 
sical work gets done, the circumstances under which careers are forged and styles 
developed and changed. On the heels of the production focus and its attention to 
creative milieux, came sociological studies of the construction of both musical value 
and reputation. 



The Construction of Musical Value and Reputation 

The stratification of composers, styles, and genres is a rich seam of socio-musical 
research. Historical studies have helped to unveil the strategies by which the musi- 
cal canon and its hierarchy of "Master [sic] Works" was constructed and institution- 
alized during the nineteenth century in Europe (Weber 1978; 1991; Citron 1993) 
and America (DiMaggio 1982). Both an aesthetic movement and an ideology for the 
furtherance of music as a profession, the fascination with "high" music culture dur- 
ing the nineteenth century was simultaneously a vehicle for the construction of class 
and status group distinction. It was also a device of music marketing and occupa- 
tional advancement. 

More recent work in this area has gone beyond the distinction between "high" 
and "low" musical forms. It now includes the issue of how "authenticity" is con- 
structed and contested (Peterson 1997), dismissing the idea of the "work itself in 
favor of particular configurations of the work in and through particular perform- 
ances (Hennion 1997; see also chapter 5, this volume). And it examines the prac- 
tices and strategies through which particular versions of aesthetic hierarchies are 
stabilized. For example, Hennion (1989) has drawn comparisons between the re- 
cording studio and the scientific laboratory, showing how musical value and scien- 
tific fact are both produced through producers' liaisons with various groups such as 
the public and the media. Similarly Maisonneuve (2001) has focused on the way in 
which the twentieth-century technology of the gramophone afforded music's users 
new and more intensely personal modes of experiencing the love for music. Draw- 
ing upon record reviews, catalogues, liner notes, and other documents, Maison- 
neuve suggests that this technology facilitated a music user actively engaged in con- 
structing her or his tastes and monitoring self-responses. By comparing the two major 
technological revolutions in music distribution during the century, she shows how 
both musical listening and the listening subject were technologically transfigured. 
Her study thus builds upon and gives a new type of spin to William Weber's pio- 
neering work on the emergence of modern musical consumption and notions of 
"music appreciation." Similarly it highlights the extent to which the consumption of 
music involves more than listeners and works, consisting also of networks or, as 
Maisonneuve puts it, "set-ups" of objects, postures, habits, and evaluative discourses. 



44 EMPIRICAL MUS1C0L0GY 

Sociological studies of musical value can be regarded as critical or even decon- 
structive in that they suggest that apparently self-evident judgments of inherent 
quality are socially constructed. In my own work on Beethoven's reputation, for ex- 
ample (DeNora 1995), I was interested in the interaction between Beethoven's rep- 
utation and the organizational culture and practices that allowed Beethoven to be in- 
creasingly perceived (and behave) as Vienna's "greatest" composer. This project was 
by no means posed in contradiction to the idea of musical value (as some musico- 
logical critics believed, e.g., Rosen 1996, DeNora and Rosen 1997), but was rather 
concerned with two main sociological issues. The first was how, to be a social fact, 
value of any kind needs to be recognized socially. Unlike gravity or the sound bar- 
rier, artistic value is an institutional fact, not a natural one; hence, if it is to be val- 
ued, music must be socially recognized and institutionalized as valuable, particu- 
larly when it is perceived as violating the norms and conventions that characterize a 
musical field — when in other words, as with Beethoven, its acceptance constitutes 
a significant reorientation of taste. (The point is not to presume there is anything 
automatic about these recognition processes, but to explore them to see how they 
took shape.) The second issue concerned how the musical field was in flux during 
Beethoven's first decade of operation in Vienna, being increasingly transformed in 
ways that were conducive to the perception of Beethoven's "greatness": somewhat 
like a financier, Beethoven gathered increasing means with which to launch increas- 
ingly ambitious aesthetic ventures, while simultaneously augmenting his power 
within the evaluative terrain of that field. In short, I tried to document the funda- 
mentally practical aspects of how one can emerge as a socially recognized "genius," 
so highlighting the way in which genius, as a social fact, emerges from a particular 
configuration of evaluative criteria, aesthetic orientation and convention, social acts, 
discourses, and material culture. The study thus focused on the complex interaction 
between what Beethoven did, what he could do, and how he was perceived. 

Methodologically, the work began with an investigation of three interrelated fac- 
tors: the organizational context of music patronage as Beethoven entered it in 1792, 
his social network as it expanded over time, and his social situation as compared to 
that of some of his competitors. From there, I adapted methods of ethnographic ob- 
servation for use on historical data, focusing on agents and actions within this mu- 
sical field — and specifically on the entrepreneurial activities of Beethoven and his 
patrons as they presented him in contexts that would flatter his talent. Here the data 
were letters, other accounts, and contemporary descriptions of the ways in which 
Beethoven was presented to the public and quasi-public worlds of Viennese musical 
culture. These were, as I have said, highly pragmatic activities accomplished by Bee- 
thoven and his supporters, and they included such things as Beethoven's own nego- 
tiations with the editor of the Allgemeine musikalische Zeitung (a leading music peri- 
odical of the day) and his interventions in the world of piano technology. While the 
study's aims were ultimately sociological rather than musicological — to theorize, via 
a case study key issues concerned with the politics of identity — the book also sought 
to highlight the contingent nature of the writing of history: in relation to music 
scholarship, this can be understood as a move away from hagiography and toward 
an ethnomusicological perspective as applied to the canon. 

This line of enquiry has been pursued by sociologists in relation to other art 
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forms — for example, Heinich's (1996) study of van Gogh's posthumous reputation. 
It has also been pursued as a collaborative project between a musicologist (J.-M. 
Fauquet) and a sociologist (Antoine Hennion), in a recent study which argues that 
the present-day understanding of J. S. Bach is a particular "use" of the composer 
within a social context (Fauquet and Hennion 2000). By this they mean that the 
way in which Bach is configured — his value and the ethos for which he is said to 
stand — represents a form of cultural "work": it is a tool with which social realities 
are established and elaborated. The nineteenth-century discovery of Bach and his in- 
stallation as the "father of music," Fauquet and Hennion argue, were also a means of 
configuring the present; Bach's presence was a resource for articulating the meaning 
of what it was to be "modern" (Hennion and Fauquet 2001). In this case the empir- 
ical strategy was anthropological: Fauquet and Hennion followed various musical 
(and musicological) actors as they appropriated Bach and so simultaneously pro- 
duced "Bach" and themselves, defining their own identities in relation to music and, 
through music, to the social world. 



Musical Taste, Consumption and Identity 

By definition, sociological studies of musical value and its articulation address the 
matter of how music is appropriated and how music consumption is linked to sta- 
tus definition. This program is implicit in the work discussed in the previous sec- 
tion, and is in turn buttressed by quantitative studies of arts consumption that doc- 
ument links between musical taste and socioeconomic position. 

In a review of the 1982 national Survey of Public Participation in the Arts, col- 
lected for the National Endowment of the Arts by the U.S. Census Bureau, Peterson 
and Simkus (1992) examined arts participation in relation to occupational group (as 
a measure of social status). Their aim was to test the notion, as elaborated in Bour- 
dieu (1984), that there is a direct correlation between high social status and the con- 
sumption of "high" cultural goods. To do this they considered the case of musical 
taste, examining items from the survey that addressed musical genre preferences, and 
attendance at types of music performances. They concluded that, in recent years, 
perhaps particularly in the U.S., the traditional highbrow/lowbrow division of mu- 
sical taste has been transformed in favor of an omnivore-univore model. The latter 
model suggests that individuals with high occupational standing are omnivore-type 
music consumers: they attend and consume a variety of musical genres. Members of 
lower status occupational groups, by contrast, exhibit more restricted taste prefer- 
ences (univores) and are also more likely to defend those preferences vehemently. 

Quantitative modes of analysis have an important place within the sociology of 
music. Representative sampling techniques permit reliable and generalizable por- 
traits of populations, which in turn permit the testing of hypotheses — in this case, 
concerning cultural consumption and social exclusion. But, as with all methods, 
quantitative techniques pose limits, even when practiced at their best. Peterson's and 
Simkus's work (1992), for example, points directly to questions concerning music 
and the construction of self- and group-identity; most of these concern the social- 
psychological and cultural aspects of musical consumption and practice — music's 
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link, for example, with the social identities of its consumers, its role within sub- and 
small-group cultures, and its social uses within music-consuming worlds. And no- 
where is this tradition better illustrated than in the pioneering work of Paul Willis 
(1978; see also Frith 1981), with its ethnographically oriented work on the sociol- 
ogy of popular music consumption. 

Willis was concerned with how, in and through musical practice, through situ- 
ated consumption of (and talk about) music, musical structures could be seen to have 
social-organizational properties and capacities. Methodologically, his study drew 
upon participant-observation techniques (see chapter 2, this volume). The great ad- 
vantage of this kind of ethnographic observation is its ability to illuminate the nondis- 
cursive dimensions of action (such as emotions and embodiment) — the very dimen- 
sions overlooked by survey questionnaires and quasi-formal interview techniques 
(and also the dimensions of human existence most closely associated with music and 
musical response). Because of its aims, ethnography is conducted in real time and 
on the social territories germane to the research subjects themselves. If the aim of 
one's research is to understand how music functions, for example, how it inscribes 
social relations, or how it may serve to inculcate modes of agency within social set- 
tings (questions that hark back to Adorno's concerns), then the advantages of this 
approach more than outweigh its practical disadvantages (i.e., that it is labor and 
time intensive, focused on a particular milieu, and not conducive to generalization). 
In particular, ethnography's advantage lies in its holistic focus and the emphasis on 
the emergent and negotiated character of meaning within social settings (Hammers- 
ley and Atkinson 1995). Ethnography, in short, can illuminate music as it functions 
as a resource for meaning construction and for the structuring and organization of 
social settings. 

Describing ethnographic work with two groups of music consumers, the "hip- 
pies" and the "bike boys," Willis made his theoretical and methodological perspective 
clear in the book's appendix, where he emphasized the virtues of participant obser- 
vation and its ability to follow actors in natural environments and situations. When 
allied with other methods, he argued, it provided a means of understanding mem- 
bers' practices and meanings while suspending theoretical notions that might other- 
wise be externally imposed. His study involved "hanging around" with members of 
a motorbike club in an English city, engaging the men in group discussions (tape re- 
corded) where records were played and discussed, and where conversation took off 
without prompting by the researcher; in the same way, Willis investigated the hippy 
scene by visiting three groups at their "pads" and holding similar discussion sessions 
with them. Through this unobtrusive mode of inquiry, held on the respondents' nor- 
mal territory and following their ordinary conversation and action, Willis was able 
to observe how deeply music was implicated in the life worlds of his informants: 
compared to those of the hippies, the preferred songs of the bike boys were fast-paced 
and characterized by strong beats and pulsating rhythms. It is here that we can see 
the great advance of Willis's study, particularly in its handling of the "homology" con- 
cept. While Willis suggested that the preferred music of each group resonated with 
or was homologous to his groups' values and habits of being, his concern was to show 
how the boys themselves established these connections, how they themselves con- 
structed the links between their preferred forms of music and social life. This point 
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bears underlining: the structural similarities between music and social organization 
documented in Willis's book were forged through the cultural practices and lay clas- 
sifications of the group members. And it follows from this that, as Willis (1978: 193) 
put it, "objects, artifacts and institutions do not, as it were, have a single valency [one 
could read here also 'single social significance'] . It is the act of social engagement with 
a cultural item which activates and brings out particular meanings." 

In Willis's work, then, we can observe a theory of musical meaning as located 
in the interaction between musical objects and music's recipients; in this respect, 
Willis's work connects with other, more theoretically oriented, perspectives within 
music sociology that conceptualize musical meaning as the result of an interaction 
between music's properties (its mobilization of familiar or "stock" materials, con- 
ventions, styles, gestures) and the ways these properties are received and responded 
to (DeNora 1986; Martin 1995). While emphasizing the social construction of mean- 
ing, then, Willis is by no means dismissive of the ways in which music's specific prop- 
erties may lend themselves with greater or lesser degrees of fit to particular interpre- 
tations and appropriations. In the theoretical appendix to his work (1978: 200-201), 
he describes how cultural items possess "objective possibilities," but suggests that 

The same set of possibilities can encourage or hold different meanings in dif- 
ferent ways. They can reflect certain preferred meanings and structures of 
attitude and feeling. On the other hand, because they relate to something 
material in the cultural item, something specific, unique and not given from 
the outside, the "objective possibilities" can also suggest new meanings, or 
certainly influence and develop given meanings in unexpected directions. 
This uncertain process is at the heart of the flux from which the generation 
of culture flows. The scope for the interpretation or influence of the "objective 
possibilities" of an item is not, however, infinite. They constitute a limiting as 
well as an enabling structure. It is also true that what has been made of these 
possibilities historically is a powerful and limiting influence on what is taken 
from them currently. 

Willis's work demonstrates that if our aim is to understand music's social sig- 
nificance and dynamic relationship to social structure, we need to move beyond an 
exclusive concern with "the music itself and investigate the processes of its recep- 
tion and use. This line of thinking has been developed by sociologists of other cul- 
tural media: literature (Griswold 1986), television (Moores 1990), and theatre (Tota 
1997). Across these studies, attention has been devoted to the more general fabrica- 
tion of meaning and aesthetic response (including nonverbal response) through in- 
teraction with cultural texts, in ways that are directly linked to identity and world 
construction. The observation that agents attach connotations to things, and orient 
to things on the basis of perceived meanings, is a basic tenet of interpretivist sociol- 
ogy. But its implications for theorizing the nexus between aesthetic materials and so- 
ciety are profound. It signals a shift in focus from aesthetic objects and their content 
to the cultural practices in and through which aesthetic materials are appropriated 
and used to produce social life. And with this shift, we have moved from the cultural 
constructionism characteristic of recent trends in musicology (as described at the 
outset of this chapter) to the interactionist constructionism of sociology proper. 
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In Willis's study, "the boys" are seen as interpretatively active; their group values 
are "almost literally seen in the qualities of their preferred music" (p. 63). The focus 
is directed at the question of how particular actors make connections or, as Stuart 
Hall later put it, "articulations" (1980, 1986) between music and social formations. 
This approach grounds the concept of homology by focusing on the way in which 
homologies are created (articulated) and experienced, rather than seeing them as 
inherent in the relationship between pre-given musical texts and pre-given social 
contextual factors. The further development of this perspective is, arguably one of 
sociology's greatest contributions to the understanding of culture, insofar as it has 
provided concepts and descriptions of how aesthetic materials come to have, as 
Willis puts it, social "valency" in and through their circumstances of use. And to see 
how this valency is produced, ethnographic methods of observation are required — 
methods that, through their very time-intensity allow the researcher to observe ar- 
ticulations in the making, in real time and within naturally occurring situations. 

In the two decades that have followed the publication of Willis's book, the field 
of audience and reception studies has advanced considerably. But the early interac- 
tionist promise of the classic works of Willis, Frith, and Hall is too often neglected 
in favor of a preoccupation with the specifics of one or another interpretation of a 
particular cultural work. The great contribution of these writers was their focus on 
what the appropriation of cultural materials achieves in action, what culture "does" 
for its consumers within the contexts of their lives and how these processes can be 
observed ethnographically Thus one of the most striking (and usually underplayed) 
aspects of Willis's study is its conception of music as an active ingredient of social 
formation. The bike boys' preferred music didn't leave its recipients "just sit [ting] 
there moping all night" (1978: 69): it invited, perhaps incited, movement. As one of 
the boys put it, "if you hear a fast record you've got to get up and do something, I 
think. If you can't dance any more, or if the dance is over, you've just got to go for a 
burn-up" (1978: 73). 

Willis's work was pioneering in its demonstration of how music does much 
more than "depict" or embody values. It portrayed music as active and dynamic, as 
constitutive not merely of "values" but of trajectories and styles of conduct in real 
time. It reminded us of how we do things to music and we do things with music, 
dance and ride in the case of the bike boys, but beyond this, work, eat, fall asleep, 
dance, romance, daydream, exercise, celebrate, protest, worship, meditate, and pro- 
create with music playing. As one of Willis's informants put it, "you can hear the beat 
in your head, don't you . . . you go with the beat, don't you?" (1978: 72). As it is 
used, both as it plays in real time and as it is replayed in memory, music also serves 
to organize its users' actions and experiences. 



Musically Inscribed Music Consumers 

Studies such as those conducted by Willis and Frith during the 1970s have proposed 
that, for the sociology of music, one of the most fruitful analytic strategies is the 
focus on musical practice. In recent years, the ethnographic focus on musical con- 
sumption and musical practice has embraced sociological questions concerned with 
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collective behavior and social institutions, as well as questions concerned with the 
emotions and embodiment. 

A common thread running through nearly all of the new sociology of music is 
the concern with music as a resource for social action and for agency broadly con- 
ceived. Within social movement theory, for example, music has been conceptualized 
as providing "exemplars" or models within which social action and movement activ- 
ity is constructed and deployed (Eyerman and Jamieson 1997). In this respect, music 
provides, as earlier ethnographers of musical subcultures suggested, a resource for 
articulating meanings that apply beyond the sphere of music itself. Following actors 
ethnographically, as they explain themselves in terms that make reference to music, 
and as they compare themselves or their action styles to musical works, shows how 
music may actually "get into" action in specific ways, how it functions as an analogue 
or paradigm for action and cognition. This perspective develops the assumption out- 
lined by Willis' work on the bike boys, that music provides homologous resources 
for imagination and conduct. This is saying much more than that there are parallels 
between music and social forms; it is saying how such parallels are drawn and acted 
upon — how, as Middleton puts it in his description of Levi-Strauss, music comes to 
offer "a means of thinking relationships ... as this note is to that ... so X is to Y" 
(Middleton 1990:223). Examining music as it provides media for building social and 
conceptual relations both extends and operationalizes Attali's (1985) vision of music's 
"annunciatory vocation," its ability to presage social structural developments. It does 
this by shifting sociomusical interrogation away from a focus on "reciprocal interac- 
tions," homologies or structural similarities between "music" and "society" (as if these 
were two distinct realms): instead, it directs focus to the interactive relationship be- 
tween music and social activity, music and interpretation. This is a pragmatic ap- 
proach to the topic of musical meaning, one that sidesteps the text/context dichotomy 
(and the idea of the musical object) in favor of a notion of music as it is drawn into 
and becomes a resource for action, feeling, and thought. 

This focus on music as resource has recently been applied to the question of sub- 
jectivity and its cultural or social construction (Hennion 1993, Gomart and Hennion 
1999, Bull 2000, DeNora 2000, DeNora 2001). Here music is portrayed as a re- 
source for the production and self-production of emotional stances, styles, and states 
in daily life, and for the remembering of emotional states. The predominant meth- 
odological strategy within this work has been the ethnographic interview, designed 
to uncover, in the first instance, musical practices of the kind that often pass unnoted 
by respondents — for example, whether they listen to certain types of music in par- 
ticular circumstances but not in others, or whether they ever choose to listen to 
works to realign their emotional or energy state. Although this work clearly connects 
with research in social psychology (Sloboda 1992, Sloboda 2000) and ethnomusi- 
cology (Crafts, Cavicci, and Keil 1993), it also indicates an explicitly sociological 
focus on self-regulatory strategies in particular social contexts. It reveals some of 
the ways that individuals and groups engage in emotional management and in self- 
production across a range of circumstances. 

Concurrent with sociological studies of music and emotion management, there 
has been a renewed interest in music's effect on and relation to the body. This focus 
moves beyond the interest, within musicology in body imagery (Leppert 1993, Walser 
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1993) to a concern with bodily praxis and bodily phenomena. In this respect, it con- 
nects with recent work by music scholars on the topic ol performance "ergonomics" 
and the socially communicative body in performance (viewing music performance as 
just one type of social performance [Clarke and Davidson 1998]), and with work on 
the body as implied and afforded by musical form (McNeill 1995). In keeping with 
sociology's emphasis on the situated construction of musical response, new research 
by sociologists on how music may be understood to mediate corporeal states (such 
as energy, coordination, entrainment, and bodily self-awareness) downplays a con- 
ception of music as stimulus and highlights instead music's capacity to "afford" — 
to provide resources for and to enable forms of corporeal organization and states of 
being. In its focus on music's connection with modes of being and modes of attend- 
ing to the social environment, this work connects with Schutz's classic emphasis on 
the phenomenological dimension of music making (Schutz 1964). 

These issues can be illustrated through a study of my own on the role played by 
music within fitness classes (DeNora 2000: 88-102). The research site — the aero- 
bics class — was chosen because, given the music-led, choreographed character of 
aerobic exercise, it provided a venue in which music's role in relation to bodily phe- 
nomena (energy, stamina, pain perception, coordination, and motivation) was criti- 
cal, and where it could, potentially, be observed. The central aim of the research was 
to illuminate the way in which music structures physical activity and the subjective 
dimension of that activity. To that end, the study was designed to observe what was 
conceptualized as "human-music interaction" — the points where music came to 
serve as an organizing device for bodily activity. It drew together a range of meth- 
odological strategies, employed in the following order with overlap between the dif- 
ferent types of data collection: participant observation of fitness sessions (primarily 
"hi/lo" aerobics; the research was undertaken by Sophie Belcher, the extremely fit 
research assistant); in-depth interviews with music producers; in-depth interviews 
with class instructors and class members; and quick questionnaires, administered to 
class members. 

Given the aims and subject matter of the study, participant observation was a crit- 
ical investigative technique. As with most embodied practices, there are many things 
about aerobics that one can only know about by doing. Being physically stretched, 
for example, experiencing "the burn," sweating and tuning into the rhythm of a ses- 
sion, feeling at the point of fatigue and then re-energized when the music changes, 
wanting to move with gusto to the musical pulse — these are all experiential matters. 
The first form of data in this study thus consisted of the (junior) researcher's own ex- 
perience of exercising to music, her "knowing-by-doing." The second form of data 
was the record provided by the videotapes of each session. These enabled the re- 
searcher not only to recall the embodied experience of class sessions, but also to see 
and freeze otherwise fleeting and subconscious moments of class experience, to play 
them back and so enable reflection upon what it was about the music that enabled 
or constrained forms of physical activity. This reflection was facilitated through con- 
versations (in-depth interviews) with the senior researcher (myself), such that the 
research assistant was, simultaneously, researcher and key informant in the study. 
These conversations (analytically oriented debriefing sessions) in turn generated hy- 
potheses and ideas for further observation. Key among these was the strategy of ex- 
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amining "breakdowns" in sessions and of comparing "good" sessions with "bad" 
ones, that is, sessions characterized by a high degree of aerobic order with sessions 
where such things as fatigue, lack of coordination, and boredom occurred. 

The third form of data came from interviews with professional aerobics music 
providers. Here the focus was on what these providers said about music's features 
and their usefulness for exercise. The data from these interviews were compared 
with what class participants themselves thought about particular musical numbers 
and passages, types of movement, and their associated motivational states. 

The research highlighted ways in which specific musical devices were enabling 
or constraining at certain points in the aerobic session. The key point was that some 
of these devices were effective for some aspects of the exercise session but not for 
others, and this finding helped to highlight music's structuring properties in relation 
to the body and embodied activity. From there, the key research question became 
why certain features were effective at certain stages of the aerobic process. Analysis 
of all the data, and particularly the videotapes, suggested that music could be seen 
to work with and for the body (and against the body) by profiling bodily movement, 
by entraining movement, and by modeling and enabling the adoption of motiva- 
tional stances (and energies) appropriate to different segments of the session. So, for 
example, slower-paced, more "lyrical" formats were useful for the stretching move- 
ments of the warmup, while music with a highly prominent beat and powerful or- 
chestration (e.g., lower brass tutti) was useful during the core of the session charac- 
terized by a vigorous movement style. The study concluded that music could serve 
as a "prosthetic technology" (Ehn 1988: 399) of the body, a device that has the ca- 
pacity to extend and restructure bodily phenomena, including embodied states such 
as emotion and motivation. This capacity is by no means confined to the totalizing 
environment of the exercise session, but can be perceived across a range of settings 
in daily life — in the workplace, within organizations (Lanza 1995), and in com- 
mercial environments such as restaurants and shops. 

Indeed, this study was followed by an ethnography of music in retail outlets, 
with an explicit focus on these outlets' attempts to configure modes of agency (here 
understood as predispositions for and styles of action or subjectivity) by configur- 
ing the sonic environment (DeNora 2000, chapter 5). Overall, we were interested in 
how shops used music to target preferred types of consumers, and to structure the 
temporal and other aspects of the environment; and we were also interested in how 
shoppers interacted with music in-store — for instance whether they noticed it and, 
if so, what they thought about it. As with the aerobics work, our own autobiographi- 
cal experiences in relation to in-store music were used as a basis for generating inter- 
view questions and as a ground against which to analyze the in-store conduct of other 
shoppers. 

To these ends, and with the permission of the stores in question, the research 
assistant and I posed as shoppers to observe the scene in-store, and in particular to 
take note of (and compare) in-store ambience and the conduct of other shoppers. 
With tape recorders unobtrusively held in our hands (they pass as personal stereos) 
and clip-on microphones on our coat collars, we simultaneously recorded the in- 
store soundtrack and our various observations about the conduct of other shoppers 
(such as whether shoppers showed signs of engaging with the music by singing along, 
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snapping their fingers, or making dance movements). We combined this with semi- 
structured interviews with shop managers and staff about their music policies and 
how the music seemed to work in-store, as well as exit interviews with consumers 
as they left the shops (we did not have permission to speak to shoppers in-store). In 
addition, as a pilot study, we followed volunteer shoppers whom we "wired for 
sound": we asked them simply to "think out loud" as they moved through the shop, 
commenting on anything that came to mind and anything they might notice about 
the music in particular. Simultaneously, we shadowed these shoppers, one-on-one, re- 
cording our own observations about their behavior (e.g., "she is looking at an orange 
jumper now"). The two tapes could be synchronized precisely because they shared 
the same musical soundtrack, and this enabled us to overlap the two transcripts. 



Conclusion 

I have sought, in this chapter, to unpack the sociology of music's toolkit, and to fea- 
ture, in particular, the tools designed for the exploration of musical practice; some 
of these tools are new, and their utility over the long term is yet to be determined. I 
have also sought to highlight how the sociology of music currently elaborates a par- 
ticular version of constructionism, one that takes as its object of analysis music as it 
is made, used, and responded to within specific contexts and settings. Sociologists 
of music have drawn upon a range of empirical strategies for the collection and 
analysis of music's social role, from analyses of networks and the impact of associa- 
tions and information exchange on music-stylistic choices, to comparative analyses 
of institutional and organizational structures of music making and their influence on 
musical works and producers. Survey methods have been used for mapping musi- 
cal participation and taste, and in-depth interviews for exploring reception issues; 
ethnographic methods have been adopted for the examination of music as it is in- 
volved and mobilized in culture creation, group culture, and the noncognitive di- 
mensions of social being and social life. 

Over the past three decades, the sociology of music has shifted from its status 
as a somewhat abstract endeavor located on the margins of sociology, to a grounded 
and empirically oriented mode of enquiry directed to many of sociology's core con- 
cerns — social structure, consciousness, and social difference and division. In un- 
dergoing this change, the sociology of music has not only been empowered within 
sociology as a whole, but has simultaneously retooled in ways that are significant for 
musicology as that field develops toward an understanding of music as a funda- 
mentally social enterprise. 
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CHAPTER 4 



Music as Social Behavior 



Jane W Davidson 



Introduction 

In the vast majority of music-making contexts, the real or implied presence of oth- 
ers means that at some level social communication or interaction takes place: singing 
a lullaby, a work song, a hunting song, or a school song; chanting as the member of 
a football crowd; participating as either a musical performer or a spectator in a sym- 
phony orchestra concert or at a Hindu wedding. In fact, individual practice is one of 
the rare musical occasions when there is no involvement with a co-performer or 
spectator, but even here there is generally a social goal: the preparation of a per- 
formance. Recordings might seem to be another exception, but the social element is 
still implied: there is a need to communicate the musical content to someone else, 
even if for the duration of the recording the audience is imaginary. Music is a social 
act, but investigating how social behaviors function in different musical contexts, 
and what significance they have, is a very recent research interest in psychological 
approaches to Western music (for an overview see Hargreaves and North 1999). The 
delayed development of social psychological research seems to be the result of a 
largely reductionist approach to music which has tried to understand it in terms of 
its structural elements: melody, harmony, rhythm, and so forth. But, as general in- 
terest in issues related to attitudes and beliefs, and individual and group behavior, 
has grown, so too has the interest in music as a social-behavioral phenomenon. 

Within the psychology of music, Farnsworth (1954) was one of the first to ex- 
hibit an explicit interest in such issues, arguing that it was not sufficient to look at 
how a song functioned musically; rather it was important to know how the per- 
forming context operated, and how it affected both performer and audience. Anec- 
dotal accounts can go some way toward describing social behaviors, but the moti- 
vation behind the work of researchers such as Farnsworth was to undertake more 
systematic investigations. They wanted to generalize from their observations, mea- 
suring the frequencies of musical behaviors and the interrelationships of these be- 
haviors within and across individuals. Thus they adopted quantitative research de- 
signs employing statistical techniques in the analysis of data. By the 1970s and 
1980s, however, experimental studies were increasingly complemented by work in- 
fluenced by the writings of theorists like Harre (1979, 1992a, 1992b), who demon- 
strated that controlled manipulation under experimental conditions was not always 
an appropriate methodology when looking at beliefs and behaviors. Out of these 
kinds of theoretical discussion emerged New Paradigm Research, which adopted 
qualitative research techniques such as in-depth semistructured interviews and par- 
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ticipant observation in an attempt to capture the subjectivity of individual experi- 
ence (Smith, Harre, and van Langenhove 1995.) 

As a consequence, there is now a much more diverse palette of research tech- 
niques available for investigating music as a form of social behavior than there was 
at the time of Farnsworth's work. At one extreme, there is the tradition based on the 
reductionist approach which manipulates conditions so that specific variables can 
be examined. An example of this is Burland and Davidson's (2001) study of the ef- 
fects of different social groupings (single sex, mixed sex, friends, strangers, and class- 
mates) on the quality of musical compositions, exploring the effects of different so- 
cial combinations on problem-solving in a musical composition task; without the 
controlled manipulation of the independent variable (the social group in which the 
children were working), the study would not have been able to demonstrate the im- 
pact of social processes on the dependent variable (the musical composition). 1 Formal 
questionnaires, surveys, and tests have also been used. For instance, Finnas (1987) 
wanted to explore the musical preferences of young people; in order to do this, he 
designed an experiment in which participants listened to and then rated excerpts on 
a preference scale — a typical quantitative design that is then subjected to statistical 
analysis. Since Finnas believed that most participants of this age group would pre- 
fer rock over other forms of music, he designed the experiment in such a way that 
participants always had to compare a test excerpt (a piece of classical music, for in- 
stance) with a rock item, so that the relative preference for an item in relation to the 
best liked piece could be determined. 

Other investigations have combined qualitative and quantitative methods. 
Davidson and Scutt (1999) used a quantitative measure of musical achievement (a 
musical examination grade) as an objective point around which to assess the role of 
students', parents', and teachers' beliefs and practices in preparing for these exami- 
nations, collected by means of qualitative interviews. By contrast, work of an entirely 
qualitative nature, like that of Kanellopoulos (1999), demonstrates the New Para- 
digm approach in which the researcher plays an active role in the generation and in- 
terpretation of the data. 2 Kanellopoulos wanted to know how a class of children 
worked on free improvisation. The analysis emerged from examining the content of 
conversations Kanellopoulos had with the class of children, semistructured inter- 
views, and tape recordings of the musical and social interactions during the improv- 
isation process. Kanellopoulos also became a co-improviser, attempting to enter the 
children's world by participating in the creative process alongside them. The data re- 
vealed the critical role of social context, with classmates and the teacher assisting in 
the development of musical improvisation skills. 

In summary, a broad spectrum of research approaches has developed, each ap- 
proach serving a slightly different purpose. In Burland and Davidson's study, only 
a short period of time was available in which to explore the impact of social group 
on the ability to compose, and so the most effective approach was an experiment 
that deliberately formed groups and so manipulated the social context in the study. 
For Finnas, rating scales produced easily quantifiable data in response to a straight- 
forward question: what are young people's musical preferences? Davidson and Scutt 
used data of a more objective nature and compared them with reports of thoughts, 
feelings, and practices in the preparation, execution, and follow-up periods to a 
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public examination; these longitudinal qualitative data permitted emergent social re- 
lationships and key themes to be described and explored over a six-month period. 
Finally, Kanellopoulos undertook qualitative interviews similar to those of Davidson 
and Scutt, but involved himself fully in the process, reflecting on his own input as 
teacher and guide to the children. 

The aim of this chapter is not only to illustrate the breadth of social-behavioral 
research, but also to look in detail at how music has been analyzed as a form of so- 
cial behavior, and to pinpoint why particular methods of analysis might be appro- 
priate to specific research questions. It will not be possible to provide an overview 
of all the forms of social musical behavior, nor to explore the vast range of strategies 
that might be employed to examine research questions. Instead, the focus will be on 
a number of specific areas from within the Western art tradition: the music learning 
environment; social influences on musical behaviors; individual differences in mu- 
sical behavior; and the social context of musical performance. 



Social Factors in Musical Skill Acquisition 

Research techniques applied to questions of skill acquisition have drawn on a whole 
spectrum of approaches, from large-scale quantitative questionnaire studies to de- 
tailed qualitative case studies. Two contrasting research approaches are considered 
here: the first involves the collection of biographical data to provide quantitative and 
generalizable results, while the second considers material focused on the individual 
and derived from a longitudinal case study. 

The Biographical Survey 

If you want to find out how musical skills are acquired within a social context, mu- 
sicians' biographies can be an effective source of information. Biographies are useful 
because they can trace the factors, social and otherwise, that have led to musical 
achievement. In terms of existing social-psychological research in music, retrospec- 
tive biographical accounts have provided the major source of information about key 
influences on the development of musicians, particularly those of prodigious achieve- 
ment. Such research has been undertaken in a variety of ways, but principally 
through interviews in which parents and children talk about their lives and what 
happened at particular stages of their musical development (e.g., Sloboda and Howe 
1991). This is not an optimal research technique, since human memory is notoriously 
unreliable, but it is often possible to verify retrospective accounts at least in part by 
asking how, for instance, a child was practicing at a certain time, or looking for data 
of a more objective nature such as the dates of examinations and grades achieved. 

Biographical research has clear and established precedents in historical musicol- 
ogy which can provide a useful resource for subsequent analysis. Lehmann (1997), 
for instance, carried out an analysis of historical documents relating to famous indi- 
viduals, cross-referencing a variety of texts to provide both confirmatory evidence 
and extra detail. While this is a useful method of collecting data, it is also problem- 
atic. Historical documents can be unreliable, with accounts from similar periods often 
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being summaries of one another and no opportunity available to fill the gaps in what 
may be an incomplete record. Furthermore, the accuracy of the data can be question- 
able when it relies on second-hand reports written many years after the actual events. 
Despite such drawbacks, however, primary source material such as the considerable 
correspondence between Clara and Robert Schumann (Weissweiler 1984) — or the 
diaries of Berlioz (Searle 1966), which tell of concert life and the role of various in- 
dividuals in the development of his musical skills — are important documentary re- 
sources, rich in self-reflection and insight. And similar kinds of diary data have been 
widely used in socio-behavioral enquiries. For instance, in research on pregnancy, 
Smith (1994) gained all manner of insights into a woman's sense of changing self 
from the diary she kept during the course of her pregnancy. But the key to under- 
standing the free accounts in the diaries was the follow-up interview: by asking sim- 
ilar questions of all the women participating in his study, Smith was able to under- 
take a comparative analysis of both sets of data. The interviews provided information 
that then permitted all the material to be systematically structured and thematically 
arranged. This, obviously, is not possible in the case of historical data. 

In looking at the events underlying musical development, an alternative to the 
historical biography is a real-time survey over the life-span of many individuals. Re- 
searching music over the life span is a tricky business, since in Western culture not 
everyone learns an instrument, and a very large population survey would be needed 
to ensure the inclusion of a sufficient number of participants who developed musi- 
cal skills and then progressed to high levels of accomplishment. Sloboda went some 
way toward collecting this kind of data by adding a section of questions on music to 
the Mass-Observation Project based at the University of Sussex, which involves 
around 500 people in an ongoing recording of their daily lives (Sheridan, Bloome, 
and Street 1998). In autumn 1997, data on individual involvement with music was 
sought, with family and other influences being explored in order to find out what 
proportion of the population engage in music and how this process occurs. Al- 
though some elements of these data have been reported by Sloboda in conference 
proceedings (1998), the main analyses have yet to be undertaken as they are de- 
pendent on further data collection. 

It is generally very difficult to find ways of separating out the many potential 
factors that may have an effect on an individual or a social group. The next section 
shows how this difficulty can be addressed through biographical research projects 
involving empirical methods and comprehensive data collection. There are many 
possible research techniques, but I discuss the two principal techniques used in be- 
havioral research: the quantitative and the qualitative questionnaire. 

Developing a Quantitative Biographical Questionnaire Study 

In most quantitative, survey-style research, considerable effort is put into making 
sure that the right sorts of question are asked so that the data collected are appro- 
priate to the particular aims of the investigation. While reviewing the relevant liter- 
ature can help to give a preliminary indication of potentially valuable lines of en- 
quiry, it is common for a small number of in-depth interviews to be undertaken prior 
to the main questionnaire, so that the focus of questions can be clarified and devel- 
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oped. At this stage, new issues that emerge can be easily integrated into the design. 
Similarly, there is often a phase of "piloting" the questions and response categories 
in order to make sure that the data are collected in an appropriate format for analy- 
sis, as well as to identify the relevant statistical analysis techniques. Once the ques- 
tionnaire and analysis technique have been developed, individuals from the sample 
to be surveyed are often given "trial" interviews to make sure that they can deal with 
both the types of questions proposed and the interview environment itself. 

The fine details of preparing questionnaires can be found in Oppenheim's (1992) 
text on the topic. However a useful set of examples is provided by a series of stud- 
ies in which a small-scale biographical study was used as a pilot to develop a major 
investigation of young musicians with a variety of backgrounds and interests in music 
(Howe and Sloboda 1991a, Howe and Sloboda 1991b, Sloboda and Howe 1991, 
Davidson, Howe, and Sloboda 1995/1996, Davidson, Howe, Moore, and Sloboda 
1996, Davidson, Howe, Moore, and Sloboda 1998). Initially, Sloboda and Howe 
wanted to find out what social and other factors such as motivation, practice habits 
and so on were shared by musicians of prodigious talent; the intention was to un- 
ravel the issue of how far musical ability is the product of nature or nurture. The re- 
searchers chose to survey 42 young musicians of between 8 and 18 years of age, who 
were all in specialist musical education and had musical accomplishments well be- 
yond those expected of a normal child. This number of children was chosen so that 
shared factors could be more easily identified and grouped: it would be difficult, for 
instance, to say that a particular social factor was influential in the development of 
musical ability if only two out of three interviewees shared it. The 42 children were 
statistically representative of their group, and were sufficiently numerous to gener- 
ate statistically analyzable results relating to the behavioral influences on young, tal- 
ented musicians in Britain. 

Sloboda and Howe initially worked from the existing research literature on high 
achievement in order to identify a number of key areas for the interviews: the im- 
pact of parents, teachers, siblings, and role models. They also wanted to find out in- 
formation that was specific to the musical tasks, such as how often the children prac- 
ticed and what examinations they had taken. Having identified these broad areas of 
interest, they then constructed an open-ended interview schedule — that is, a sched- 
ule that would be delivered in an informal conversational style. In open-ended ques- 
tioning of this kind, the interviewer needs to know the interview topic areas very 
well, and be prepared to go along with the flow of a conversation from one topic to 
the next at the participant's pace; he or she needs to pursue areas that may emerge 
spontaneously in the interview, and that may end up providing important insights 
into the participant's thoughts about an issue. Despite the open nature of these in- 
terviews, however, Sloboda and Howe made sure that they began with general, non- 
invasive areas of questioning, and then as the interview went on, proceeded to more 
specific questions, with the ultimate aim of finding out more personal or potentially 
sensitive information. As Smith, Flowers, and Osborn (1997) have pointed out, this 
kind of funneling of questions allows for a rapport to be established between inter- 
viewer and interviewee prior to tackling questions of a personal nature. 

After carrying out their interviews with the young musicians in this semi- 
structured manner (one-to-one and tape recorded), Sloboda and Howe developed a 
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second schedule for the musicians' parents, who were all interviewed independently. 
Each interview lasted approximately 40 minutes and full transcripts of all interviews 
were made. At this stage, the researchers examined all the data for common themes, 
and for any corroboration between the parent and child interviews that might vali- 
date what had been said; this approach produced more themes and topics than the 
researchers had initially anticipated, resulting in a rich data set that was highly spe- 
cific to the individuals interviewed. 

Once they had grouped the data according to theme, the researchers undertook 
statistical analyses of the frequency of different behaviors and the correlations be- 
tween practice and achievement, exploring those factors that contributed most to 
these children's musical progress. These statistical analyses (reported in Sloboda and 
Howe 1991) were used to show that the children they interviewed were significantly 
supported by social networks of family, friends, and teachers. Although the data 
were dependent on retrospective memory, and for that reason potentially unreliable, 
the independent interviews with children and parents allowed for differences in rec- 
ollections to be accounted for. Moreover, by interviewing children still in education, 
the interviewers made a deliberate attempt to collect data as near as possible to the 
time that the relevant events occurred. 

In the three publications that resulted from their biographical investigation of 
the 42 children (Howe and Sloboda 1991a, Howe and Sloboda 1991b, Sloboda and 
Howe 1991), Sloboda and Howe highlighted different elements of the data. The first 
paper looked for quantifiable results to show trends for the particular group of in- 
dividuals investigated, while the subsequent two papers dealt much more directly 
with the data as they were collected, for instance providing extensive quotes to il- 
lustrate how children from professional musician families saw music as an integral 
part of their lives. In a text on research techniques in the social sciences, Robson 
(1993) notes that some of the best research is that which combines the generaliz- 
ability of statistical findings — for example, that 73% of those interviewed had sup- 
portive parents — with the particularities of individual cases. The study reported 
above, however, was not only an end in itself, but also provided pilot data for a much 
larger-scale and more formally structured questionnaire study which combined past 
and current accounts of musical engagement. 

Developing and refining the research questions and expanding the population 
to be surveyed to include children who had given up music as well as those who had 
music as a second or third hobby, Sloboda and his colleagues developed a formal 
multiple-choice questionnaire suitable for 45-minute interviews with over 250 chil- 
dren and their parents. Like the initial study, this larger-scale survey involved inter- 
viewing all children and parents individually so that comparisons between the data 
could be made. Besides the more formal retrospective interviews concerning family, 
sibling, and teacher influences, the children were asked to maintain diaries of their 
lessons, practicing, and other forms of engagement with music. These diaries pro- 
duced a more accurate survey of what children were actually doing by reducing the 
problems of reconstruction and retrospection. 

One group of children investigated by the researchers (those who had aspira- 
tions to become professional musicians but were not receiving a specialist musical 
education) came from a wide geographical spread, so postal survey techniques were 
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adopted in order to collect data from them. Oppenheim (1992) observes that it is 
not untypical for return rates on postal surveys to be as low as 15%. Such low re- 
turns may be a depressing prospect, but an astute researcher can increase potential 
return rates in a number of ways. First and most obviously, material rewards such as 
a free CD can be offered to participants, putting them under a certain obligation to 
respond. Second, the questionnaire should be designed and expressed in such a way 
that the participants feel that what is being requested of them will be of value to the 
research. Third, the questionnaire should not be too long or complex. Careful pilot- 
ing of how questions are constructed and set out on the page is just as important as 
the content and relevance to the potential respondent: for instance, a questionnaire 
might begin with easy, general questions and work toward more specific and "diffi- 
cult" ones (such as those that are of a personal nature or require a lengthy written 
description as opposed to a one -word response). Sloboda and his colleagues metic- 
ulously piloted their postal questionnaire, modifying wordings of questions and re- 
sponse sheets in the light of their face-to-face interviews, and their careful planning 
resulted in a return rate of around 50 percent. 

This large-scale study demonstrated that children who regarded musical par- 
ticipation as a second or third hobby alongside sports or other activities, or gave up 
musical participation completely, had entirely different family, school, and peer cir- 
cumstances compared with those children who attained high levels of achievement. 
From this, the researchers were able to gain some insight into the role of environ- 
mental influences in the development of musical ability, which in turn enabled them 
to engage in a much broader theoretical debate concerning the relative roles of na- 
ture and nurture in musical development. (These specific debates can be found in 
Sloboda, Davidson, and Howe 1994a, Sloboda, Davidson, and Howe 1994b, Howe, 
Davidson, and Sloboda 1998a, Howe, Davidson, and Sloboda 1998b.) 

In summary, Sloboda and his colleagues aimed their research at discovering 
general characteristics and overall trends, and they used the semistructured inter- 
view primarily as a starting point from which to develop a fully structured ques- 
tionnaire. By contrast, the next case study uses the semistructured interview as a 
means of producing a very detailed case study of a single family's views and behav- 
iors with regard to music. 

The Case Study and Interpretative Qualitative Analysis 

A series of publications by Borthwick and Davidson tackles the issue of how the 
family, as a social unit, supports children's musical learning (Borthwick 2000; Borth- 
wick and Davidson 2002; Davidson and Borthwick 2002). Borthwick noted that al- 
though researchers like Sloboda and Howe had considered the influence of others in 
the child's acquisition of musical skills, they had not looked at interactions within the 
immediate family and how these might influence musical development. She there- 
fore focused her work on the complexity of the interactions within a specific family: 
parent-child, parent-parent, and sibling- sibling. Overall, her work demonstrated 
not only that the elder sibling provided a role model for the younger one, but also 
that the parents established a strong "script" in relation to their children's music. The 
concept of a script was adapted by Borthwick from the work of Byng-Hall (1995), a 
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family therapist, who defines a "script" as a set of beliefs and behaviors that regulate 
the social roles played by each individual within the family. By keeping detailed 
records of observed and reported behaviors and beliefs over a period of 18 months, 
Borthwick was able to argue that music scripts exert a powerful influence on chil- 
dren's musical development and their perception of themselves in relation to other 
family members. For instance, exploring parent-child coalitions, she discovered 
that the father seemed to project his own musical identity as a professional pianist 
onto his eldest daughter, and that the daughter took on the script as part of her self- 
identity believing that she was the inheritor of her father's abilities. 

This approach is interesting for two reasons: first, the "script" theory provides a 
framework within which to interpret the highly subjective material that Borthwick 
collected from the family; and second, Borthwick took on the role not only of inter- 
viewer, but also of observer of, and participant in, the family's life. Visiting the fam- 
ily every two weeks for interviews, she was able to observe the children's music mak- 
ing in the home and how it was discussed and supervised by the parents. She was 
also invited to participate in family events like Easter egg making, so that from early 
in the research period she faced few of the "outsider" difficulties that an interviewer 
coming to a family for the first time might encounter. 

Borthwick's approach generated a depth of data far in excess of that produced 
by Sloboda and his colleagues. However it was extremely time consuming, and the 
data analysis (based on transcripts of interviews, conversations, and a series of di- 
aries that she maintained) was very detailed and labor intensive. Although the use 
of such material is well established in ethnomusicology (see chapter 2, this volume), 
it is only recently that interview, discourse, and diary text has been used for the psy- 
chological analysis of musical behavior (e.g., Davidson and Smith 1997). Within 
psychological, sociological, and educational frameworks more generally, however, 
the 1990s saw an increase in qualitative research of this kind. Techniques such as 
verbal protocol analysis, discourse analysis, grounded theory, and interpretative 
phenomenological analysis have emerged (for an overview, see Robson 1993), all of 
which use small numbers of participants or even single case studies. In all these 
techniques, the intersubjectivity between the participant and the researcher is ac- 
knowledged as part of the research process, requiring that the researcher's own 
thoughts and feelings about the participant are discussed and included as a critical 
part of the analysis. Measures of reliability are not attainable in the same way as in 
quantitative research (that is to say, by conventional statistical methods), but this 
problem can be tackled in two different ways. One approach is for the researcher to 
ask another person with similar analytical skills to examine both the raw data and 
the analysis in order to evaluate the internal consistency of the researcher's analysis. 
An alternative is for the researcher to collect multiple forms of data (e.g., interview 
and diary data) and to use these different sources to "triangulate" the data — that is, 
to access the participants' thought processes from a variety of convergent perspec- 
tives. In either case, the analyses are often taken back to the participant, who may 
be asked whether or not the interpretations are plausible. 

The research methodology adopted by Borthwick is highly interpretative and 
has its roots in the belief that individuals construct a sense of reality on the basis of 
understanding and interacting with a social world (see Gergen and Davis 1985). The 
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data are discussed as constructions rather than realities, and as a researcher Borth- 
wick considered reflexivity to be central to her work. This involved making explicit 
the processes by which her data and analyses are produced, acknowledging her per- 
sonal interests and values as a researcher and the way in which these influenced the 
research process. Unlike Sloboda and Howe's approach, which does not mention the 
role of the interviewer in eliciting particular responses from the interviewees, Borth- 
wick acknowledged that the interview data she collected are negotiated between her 
and the interviewee at a particular point in time. As part of this, she discussed her 
own family background and how this may have influenced her in the development 
of her questionnaires and the way in which she approached her interviewees. 

Basing her work on individuals' accounts rather than attempting to produce an 
objective statement, Borthwick compared the interpretation of her interviewees' ex- 
periences with looking for meaning in a text. She assumed that what was said by the 
participants was significant, and that there was some relationship between this and 
their more enduring beliefs or constructs. She drew on a particular analytical tech- 
nique which has emerged out of theoretical work on social constructionism and 
reflexivity — in this case, interpretative phenomenological analysis (IPA). Used ex- 
tensively by Smith (1996, 1999), this technique is based on the assumption that 
interviewees do not express all their thoughts and feelings, so that what they say 
must be interpreted in the light of other observations of their behavior. The proce- 
dure involves examining transcripts and other forms of data for themes. The re- 
searcher does this pragmatically, making summaries of interviews, lists of associa- 
tions and potential connections between them. Main themes and subthemes are 
created and then discussed, the aim being to produce a "grounded analysis" — that 
is, an analysis based in and emerging from the data. Smith's work has been in the 
area of health psychology, and the objective entity of the patient's body provides a 
solid backdrop onto which subjective psychological accounts of the physical 
processes can be projected. In the same way, Borthwick used the musical instrument 
and progress on it (assessed through examination results and reports from teachers) 
as the evidence in terms of which she could interpret the interviewees' accounts of 
themselves, their relationship with the instrument, and the role of others in shaping 
their musical development. 

The advantages of IPA are that it produces very detailed data specific to a par- 
ticular group or individual, and that it facilitates an intimacy with the data (and ar- 
guably the participants), in a way that formally structured quantitative analyses do 
not; at the same time, the technique of triangulation serves to counteract what might 
otherwise be seen as the excessive informality or subjectivity of the approach. While 
the analysis of verbal or written material may appear to be less technical than quan- 
titative analysis, and in that sense easier, qualitative techniques require sensitivity 
and critical insight that depend on a rigorous training: qualitative research is by no 
means an easy option, as it is heavily dependent on the researcher's skills with text 
and interpretation, and the ability to write about his or her experiences. This is not 
to say that quantitative research does not also require skills of interpretation, but the 
results obtained from quantitative research are usually contained within a more gen- 
erally agreed interpretative framework, and make reference to more formally defined 
notions of significance. 
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These considerations demonstrate the importance of finding an appropriate 
theoretical and practical approach to the issues and materials of a research topic. 
Nowhere is this more evident than in the subject of the next section: research into 
the influence of social factors on the shaping of musical tastes. 



Social Factors in Musical Taste 

People nowadays have access to a huge range of musics from across the world, 
through both live performance and broadcast or recorded media. Yet, despite this 
global potential, research demonstrates what most of us probably suspect already — 
that different types of music appeal to people of differing social groups. As Russell 
(1997) points out, the audience at a Country and Western concert will probably have 
little in common with the members of an opera house audience. People's tastes are 
heavily influenced by a huge variety of social factors, and a primary aim of research 
in this area has been to pinpoint the principal sources of influence. The results have 
typically shown consistent social class, age, and gender biases (DiMaggio and Useem 
1978, Pegg 1984, Abeles and Chung 1996, Russell 1997). 

Research into musical tastes has generally made use of surveys or question- 
naires, in the same way as the large-scale study by Sloboda and his colleagues. The 
techniques Sloboda et al. employed were straightforward: structured questionnaires 
consisting of closed questions designed for the specific groups targeted, with the 
completed questionnaires analyzed simply by counting the frequency of different re- 
sponses. The researchers did not attempt to survey large numbers of the population, 
but they drew upon representative samples from different age groups and both 
sexes, in order to give focus to their research while at the same time avoiding undue 
narrowness. But social psychological research in the domain of musical taste has not 
always involved simply asking people to rate or report how much they like or dis- 
like a particular kind of music. Surreptitious methods of data collection have also 
been used, and these raise a further set of issues. A concrete example is provided by 
North and Hargreaves's (1996) study of responses to music in a university canteen. 

Surreptitious Methods of Data Collection 

The aim of North and Hargreaves's study was to test a theory of aesthetic response 
to music proposed by Berlyne (1971). The theory proposes a relationship between 
structural complexity and liking (listeners prefer music that is neither too simple nor 
too complex), so North and Hargreaves used music with low, moderate, and high 
levels of complexity. Since the study drew on university students as participants, 
they decided to play a kind of music that they believed that the students might rec- 
ognize: New Age. They undertook a series of pilot investigations in order to ensure 
that the specific musical materials were unfamiliar (so that previous knowledge of a 
particular track could not influence judgment), but nonetheless readily recognizable 
as belonging to the same genre. This meant that comparisons between the different 
musical examples would be valid, ideally reflecting only differences in complexity. 
In order to ensure appropriate differences in musical complexity, the pilot investiga- 
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tions were also designed to confirm that the materials conformed with Berlyne's gen- 
eral definition of aesthetic complexity: high-complexity excerpts were "unpredict- 
able, erratic, and varied," while low-complexity excerpts were "predictable, simple, 
and uniform." All excerpts for the pilot study were 30 seconds long, and were se- 
lected to be representative of the piece from which they were taken. In the experi- 
ment proper, they played the tracks over loudspeakers in the canteen, and in a ran- 
dom order. 

The authors were concerned that the study should be as "ecologically valid" 
(i.e., naturalistic) as possible. For this reason the questions about musical preference 
were concealed within a general questionnaire survey that asked the students about 
the canteen environment. So as to give credibility to this apparent aim, the experi- 
menters went around the canteen, asking the students what aspects of the environ- 
ment they might like to change. The surreptitious assessment of the students' musi- 
cal preferences was accomplished by noting how readily they cited the music as an 
aspect of the environment which they might like to change, or how much they liked 
it; the researchers also recorded the times at which each questionnaire was filled in, 
so that they could cross-refer it to the music being played at that time. The results 
were in line with Berlyne's theory, with moderately complex music being preferred. 
The surreptitious nature of the research technique enabled the researchers to show 
that as dislike of the music increased, the music became more salient. Had the ex- 
perimenters asked the students about the music in a more direct fashion, they would 
have drawn attention to the music, eliciting a more self-conscious assessment of 
these students' musical preferences. 

After this brief consideration of work on audience behavior, the next section 
considers research into social interaction between performers, with a specific focus 
on how people work together to create a musical performance. 



Ensembles as Musical and Social Groups 

There have been only a few studies exploring group processes in music, most of 
them either small-scale qualitative investigations or single case studies. The princi- 
pal reason for this is that in order to assess how a group operates, it is necessary to 
have a detailed account of their daily practices and operating procedures, and qual- 
itative approaches permit this depth of enquiry. Murnighan and Conlon's (1991) 
work is the closest to a more generalizable investigation, combining quantitative 
analysis with qualitative techniques. They initially contacted 21 string quartets by 
letter and phone, of which all but one agreed to take part. The study took the form 
of semistructured interviews with individual quartet members, which lasted be- 
tween 45 minutes and four hours. The data were then quantified where possible, 
and quotations were used as a basis for discussions between the experimenter and 
the players, in which thematic issues were developed. From the broad range of data 
collected, it seems that the success of a quartet depends on social factors within the 
group (such as the way in which the second violinist interacts with the other play- 
ers), as well as on issues of skill and repertoire. 

Murnighan and Conlon's work emerges out of the same tradition exemplified 
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by Sloboda and his colleagues' work on musicians' biographies. An alternative to the 
questionnaire approach is to study ensemble behavior in a more direct manner, an 
example of which is a study by Davidson and Good (2002); this used various kinds 
of directly recorded data to focus on the rehearsing and performing of a student 
string quartet. 

A Direct Approach to Ensemble Research 

Davidson and Good (2002) studied the musical and social processes of this quartet 
both in their general interactions with one another, and when rehearsing and per- 
forming. They did this by adopting a number of research techniques. First, in indi- 
vidual semistructured interviews, they asked the members of the quartet to give 
background details about how the ensemble had formed. Then questions were asked 
about who led both the musical and the social interactions between the quartet 
members (how the music should be played, how repertoire was chosen, which in- 
dividuals were dominant, and so on); this revealed the players' own views about 
the dynamics of the group. Then, in a second stage, the researchers simply placed a 
video camera in the rehearsal room. They analyzed the recordings using the follow- 
ing categories, for which detailed criteria were drawn up: 

• Social conversation (general topics related to friendship, jokes, etc.). 

• Nonverbal social interaction (related to nonmusical issues, and including 
physical contact, gestures, degree of proximity, looking behaviors, etc.). 

• Musical conversations (discussions about technical or expressive points in 
the music). 

• Nonverbal musical interactions (gestures demonstrating a musical purpose: 
coordinating entrances and exits, expressive gestures for particular 
passages, etc.). 

• Musical interactions (dynamics, timing profiles, and when the music starts 
and stops). 

The two authors carried out the analysis together, checking with one another 
to confirm or disconfirm their ideas. Examples of each of the categories were also 
checked by an independent evaluator as a measure of the reliability of the research- 
ers' assessments. Performance data — a video recording of the concert for which the 
quartet had been rehearsing — were analyzed in the same manner as the rehearsal 
data. All members of the quartet were then asked to view the rehearsal and perform- 
ance videos, giving feedback on the nature of their musical and social behavior. These 
discussions were also videotaped, so that they could be aligned with the original 
video footage, and they were themselves subjected to a similar form of analysis. Fi- 
nally, when all this material had been analyzed, the researchers took their analysis 
back to the four players for further commentary. This was in part to ensure that the 
analysis was a fair representation of the players' experiences, but more importantly, 
it served as an opportunity for further issues to emerge. One of these issues (the 
dominance of the male second violinist and a sexual dynamic between him and 
the female first violinist) caused a little embarrassment, but all players spoke about 
the role this individual had in the group, and how it had shaped both the general 
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conversation and the first violinist's style of leadership. Had the researchers not fed 
their analysis back to the players, this issue might never have been clarified. 

The high level of participation in this project by the players made it a dynamic 
enterprise. The data have a degree of richness and detail that is often missing in so- 
cial studies. The advantages of this kind of research are that the data are collected 
under real-world conditions; that the participants are able to give their interpreta- 
tions and provide points of clarification for the analysis; and that it provides results 
that are very specific to the individuals concerned, through accessing their individ- 
ual and collective experiences. The disadvantages are the cumbersome and time- 
consuming data collection and feedback procedures. 

The research described above has hinted at the role of the individual in the for- 
mation of group dynamics and so in the creation of a musical performance. It is ap- 
propriate, therefore, to conclude this chapter by considering how the differences 
between individual musicians have been researched, and what kinds of traits have 
been found. 



Individual Differences 

In general psychological research, there was a surge of interest during the 1930s and 
40s in the extent to which individuals differed from one another, and how the dif- 
ferences between them could be categorized (see Atkinson, Atkinson, Smith, Bern, 
and Hilgard 1990 for details). This research was fueled by the need to find measures 
of skill and ability for recruitment to the armed services. Broadly speaking three 
kinds of personality test were developed: questionnaires or inventories, projective 
tests, and objective tests. These can be classified under two headings (Kline 1994): 
nomothetic, that is looking for aspects of personality that are common to different in- 
dividuals; and idiographic, attempting to assess those that are peculiar to the indi- 
vidual. Personality inventories tend to be nomothetic, while projective tests are idio- 
graphic; objective tests may be of either type. The examples below will give a flavor 
of these tests. 

Personality Questionnaires or Inventories 

Personality questionnaires consist of sets of items (statements or questions) relevant 
to the personality variables that the test measures, and to which subjects have to re- 
spond. When designing a test, the choice of variables to be examined is a complex 
matter, and researchers have shown that there is a vast number of personality traits 
which might be measured. Some have argued that there could be as many personal- 
ity traits as there are descriptive terms for behavior. However Digman (1990) has 
shown that there is a considerable degree of overlap between the variables, as a re- 
sult of which five broad traits, together with about 50 independent traits, account 
for a large proportion of the variation between individuals. 

The five broad traits have been studied for many decades and inventories have 
been designed to access them; these usually comprise items which have to be rated 
as true or false about preferred forms of behavior — for instance, "I always lock my 
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door at night." The traits are labeled as extraversion, agreeableness, conscientious- 
ness, neuroticism, and openness (see Goldberg 1993 for more details). As for the in- 
dependent traits, a number of tests have again been developed, but the most rele- 
vant here is Cattell's Sixteen Personality Factor questionnaire (Cattell, Eber and 
Tatsuoka 1970): this is now widely used and standardized, and it is Cattell's inven- 
tory that has been used in the most well-known research on musicians' personali- 
ties. This research is summarized in Kemp (1996), and shows that musicians have 
different types of personalities from nonmusicians. Within musicians, composers are 
found to be more imaginative and sensitive than others, while brass players are more 
outgoing and boisterous than string players. As a group, however, there is a general 
tendency for musicians to be introverted, and they display much higher levels of an- 
drogyny in their personalities than nonmusicians. Kemp uses these findings as the 
focus for a discussion of whether it is an individual's personality per se, or the social 
factors surrounding him or her, that leads to these rather uniform behaviors. This is, 
of course, the central issue: what can these results usefully demonstrate? Clearly they 
can show that there are group characteristics (the brashness of brass players, for in- 
stance), and these can be compared with those of other groups (visual artists, say, or 
white middle class males). The data do not, however, enable researchers to say why 
such behaviors occur: the question which Kemp addresses has the look of chicken- 
and-egg about it. 

The details of such personality measures need to be carefully studied before 
empirical work is undertaken, but many investigators find these standardized tools 
very useful and they have a long track record. They are also publicly available and 
easy to score, and have been extensively tested for reliability. Moreover, because they 
are standardized, an individual's personality can easily be tested by administering 
the same inventory on a number of occasions, and the resulting profile can be com- 
pared to norms derived from the typical profiles of various sample groups. 

Kemp's book is a good starting point for exploring such inventories; the detailed 
discussion of his own research shows how the standard questionnaires can be mod- 
ified for use with musicians, and how specific data can be tested against the norms. 
There are, however, two main disadvantages of the inventory technique. First, items 
have to be short in order to cover all the dimensions of the personality, and such 
brevity can lead to simplistic statements which fail to capture the complexities of an 
individual and his or her behaviors. Second, responses can be faked. For instance, 
if you complete an inventory in a job interview which bluntly asks whether or not 
you are shy, unreliable or erratic, you may not be inclined to give an honest answer. 

Open-Ended or Projective Tests 

Projective tests emerged from Freudian psychodynamic theory, with its emphasis on 
imaginative production and its attempt to gain insights into subconscious or hidden 
aspects of the personality. A typical test of this kind is Rorschach's inkblot (Ror- 
schach 1921; see Kline 1994, for details). In the test a series of 10 cards, each with 
a different size and color of inkblot, is shown to the participant and she or he is 
asked to report everything that each inkblot resembles. The responses are examined 
according to three main categories: 
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• Location (whether all or part of the inkblot is referenced) 

• Determinants (whether it is shape, texture, or shading that is reported as 
being important) 

• Content (what the blot represents) 

Although some systematic criteria have been applied to the response categories in 
order to allow for comparisons between individuals, the responses typically provide 
a qualitative impression of the participant's general disposition, for instance in terms 
of competition or cooperation, or a focus on positive or negative images. 

The advantages of projective tests are that they are a rich source of data. They 
seem particularly useful in the exploration of subtle aspects of personality that can- 
not easily be categorized and described, such as an individual's contrariness. How- 
ever, these are highly subjective interpretations, and although the rationale is that 
the participant's inner world is tapped, guiding a person through a test, and the sub- 
sequent interpretation of the data, can be problematic unless the questioner has psy- 
choanalytical skills. 

Objective Tests 

The essential feature of objective tests is that their purpose cannot be guessed by 
subjects. Typically questionnaires are used in these tests, but the data are not the an- 
swers to the questions: rather it is the behavior accompanying the completion of the 
questionnaire that counts. For example, the degree of acquiescence demonstrated in 
answering oral questions might be used as a measure of the participant's anxiety 
level. Another measure of anxiety level is the "Fidgetometer," which uses a special 
chair with electrical contacts which are sensitive to the subject's movements; a score 
is derived by calculating the number of movements over a fixed amount of time. 
Objective tests seem appealing because they are hard to "fake," but as Kline (1994) 
points out, despite his own efforts and those of other psychologists interested in in- 
dividual testing, the validity of these objective tests has yet to be ascertained. 

In summary, individual differences can be tested in a number of ways. The es- 
sential problem facing the researcher is finding a tool which will access an individ- 
ual's traits in a manner that is both sensitive and reliable. From the discussion above, 
it is clear that there are problems as well as advantages with all the different research 
techniques available. 

Personality and Preferences 

In addition to looking at personality per se, tests can be used to examine the way in 
which individuals' preferences and tastes are affected by their personality. An ex- 
ample of this in music is Rawlings and Ciancarelli's (1997) use of the Neuroticism, 
Extraversion, Openness (NEO) Personality Inventory (developed by Costa and Mc- 
Crae 1985), to explore which types of personality correlate with a preference for 
specific types of music. Given the accessible nature of this study, it is a useful model 
of how individual difference research can be carried out in musical contexts. 

Rawlings and Ciancarelli employed 10 categories of question, each made up of 
several items that identify specific types of music. In framing their questions they made 
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use of a standardized measure of musical preference (Litle and Zuckerman's Music 
Preference Scale, 1986), to which the researchers added more recent types of music 
such as "acid jazz" to update the measure, basing their modifications on discussions 
with representatives of a large music store. Working from a typed questionnaire 
sheet, participants rated how much they liked each item. The resultant data were an- 
alyzed using the NEO Personality Inventory; typical findings were that females liked 
popular music styles more than males did, and that individuals who scored strongly 
on the personality dimension "openness" enjoyed a wide range of musical styles. 



Conclusion 

This chapter has sampled just some of the wide variety of techniques available to ad- 
dress research into music as social behavior. But a few key issues emerge which apply 
to all such work, and these might be summarized as follows. In the first place, the 
quantity of research in music behavior is quite limited, which means that researchers 
need to be creative and open-minded in their approach. In other words, do not be 
put off by a lack of precedent: some of the most engaging and penetrating research 
emerges out of lateral thinking and an eclectic approach to empirical design. At the 
same time, you need to be systematic, persistent, and reflexive: it is essential to apply 
the appropriate research technique correctly, in the sense of fully understanding 
what data are being collected, and how these data may be analyzed. The research 
technique you adopt will determine how explicitly you set out your hypotheses, or 
how far you follow up emergent themes; however it is always important to look for 
ways of examining the data as fully as possible, and to be aware of possible prob- 
lems. Constant reflection on working processes is necessary if the result is to be co- 
herent. Finally, analyzing music as social behavior means analyzing how people en- 
gage with music and with one another at individual, group, and societal levels, and 
that in turn means that ethical considerations cannot be ignored. Most academic in- 
stitutions, health authorities, and other public sector institutions in which research 
takes place run ethics committees; these monitor proposed research projects so as to 
ensure that no one is exploited, mistreated, or deliberately subjected to physical or 
psychological discomfort. If you are going to undertake social research, make sure 
your proposal is seen and approved by an ethics committee, or failing that by at least 
three experienced researchers. Getting the design right in the planning stages can save 
a lot of anxiety later. 

Exploring music and social behavior is both stimulating and challenging. Face 
that challenge with energy, an open mind, and sensitivity to others, and your re- 
search could make an important contribution to an ever-expanding field. 



Notes 

1. For a discussion of independent and dependent variables, see chapter 9, this 
volume. 

2. A comparison may be drawn with the participant-observation approach common in 
ethnomusicology; see chapter 2, this volume. 
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CHAPTER 5 



Empirical Methods in the Study 
of Performance 

Eric Clarke 

Introduction 

From the perspective of musicology the empirical study of performance has coincided 
with a move away from the primacy of the score and toward increasing interest in 
music as performance. The rise of performance studies as a research area has brought 
a focus on different performance traditions, the nature of performance interpretation 
and its relationship to analysis, and the legacy of historical recordings (now dating 
back 100 years) and what it can tell us about changes in performance styles. Musi- 
cologists have been interested in empirical studies of performance as ways of docu- 
menting what goes on in performance, and for their ability to make performance a 
concrete object of study with the same tangibility that was previously confined to 
scores and sketches. 

Although performance occupies a central position in just about every musical 
culture, systematic studies of performance go back only to the turn of the twentieth 
century. The reason for this is the problem of transience: only once methods had 
been developed to record either the sounds of performance, or the actions of in- 
struments, was any kind of detailed study possible — and so the piano roll, record, 
magnetic tape, and computer have all played their part at different stages in the short 
history of empirical studies of performance. Gabrielsson (1999) provides a survey of 
empirical studies starting with Binet and Courtier's (1895) study of piano playing, 
and demonstrates the rapid growth in the field — particularly since about 1980, 
while the 37 separate chapters in Rink (2002) and Parncutt and McPherson (2002) 
indicate how active the field continues to be. A significant factor in this development 
has been the involvement of psychologists, to whom music performance has ap- 
pealed as an area of study on various grounds. It represents an example of a very so- 
phisticated and complex motor skill, on which there is a wider research literature in 
psychology; it has affinities with language (about which there has been a great deal 
of psychological research), but represents a distinct form of nonverbal communica- 
tion; it provides an opportunity to study rhythmic and other temporal skills at var- 
ious levels of expertise; and it provides a "window" onto hidden cognitive processes 
in music. For these and other reasons, and because of the converging interests of 
psychologists and musicologists, performance research is perhaps the most developed 
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area of empirical musicology — though the differing motivations for studying per- 
formance have given rise to certain problems, as I discuss at the end of this chapter. 
Because of the specific issues investigated, and for technical reasons, the great 
majority of empirical work on performance has looked at keyboard performance. 
The keyboard offers a number of specific advantages for empirical research: 

• It provides a ready-made context in which to study the coordination and con- 
trol of concurrent tasks, because the two hands essentially "do the same thing" 
(albeit in mirror image). 

• There is a large and varied repertoire of solo music for the instrument, provid- 
ing the opportunity to study individual performance in an entirely realistic 
way, as well as a large ensemble repertoire (including piano duets and four- 
hand material). 

• The percussive character of the instrument makes it an effective way to study 
rhythmic skills, and makes accurate timing analysis from sound files possible 
(due to the sharp onsets of events). 

• It is easier and less intrusive to take direct mechanical measurements from the 
keyboard than from almost any other instrument, due to the physical separa- 
tion of the instrument from the performer. 

• Since the early 1980s a range of commercially available keyboard instruments 
has existed that can be directly monitored by computer, allowing performance 
data to be captured and analyzed easily and with considerable precision. Since 
the mid-1980s this range of instruments has included real pianos. 

The principal focus of this chapter is therefore on the study of keyboard perform- 
ance. 

Landmarks 

To give a quick overview of how the empirical study of performance has developed, 
the following is a list of some of the publications that have played an important part 
in defining the field: 

• In the early 1930s, Seashore and his research associates developed an exten- 
sive research program in music performance at the University of Iowa, much 
of which is brought together and reported in Seashore's summative book 
[1967 (1938)]. This represents the earliest extensive and systematic empirical 
work on performance, and identified many of the issues that have remained 
the preoccupations of subsequent research. 

• Povel (1977) describes expressive timing in a number of Bach harpsichord 
performances, and Bengtsson and Gabrielsson (1977) in performances of 
Swedish folk tunes. Together, these represent the first significant publications 
of the "modern" period of performance research. 

• Shaffer (1981) is the first substantial paper to report results obtained from 
direct computer monitoring of the piano. The paper concentrates on timing, 
coordination, expression, and the cognitive representation of complex move- 
ments. 
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• Sundberg, Fryden, and Askenfelt (1983) is the first published attempt to pro- 
duce an artificial model of performance expression, using a collection of sepa- 
rate rules that relate to local features of the music. Todd (1985) is a subse- 
quent attempt to achieve the same goal using only a single rule applied 
recursively. 

• Repp (1990a) is the first paper to look at a larger body of performance data, 
using commercial recordings and extracting performance data from the re- 
corded sound. Since that first paper, Repp has gone on to investigate larger 
collections of performances, in some cases analyzing over 100 recorded per- 
formances of the same work. 

• Davidson (1993) is the first published work analyzing the visual component 
of expressive performance. 

• Rink (1995) represents the first large-scale publication bringing together 
musicologists and psychologists in the study of performance. 

With this overall pattern of development in mind, the following sections discuss the 
different methods that have been used and some of the issues that have been tackled. 



Using MIDI to Study Performance 

As already observed, a survey of the existing empirical work on performance dem- 
onstrates that by far the largest body of research has examined data derived from the 
direct measurement of keyboard performances. Until the mid-1980s, this was only 
possible by building a specialized technical setup (examples of which are Seashore's 
piano camera in the 1930s, and Shaffer's piano /computer interface designed by him 
in the 1970s), or by using synthesizer keyboards. From the early 1980s, synthesizer 
keyboards could be connected to tone generators and computers using a specific 
digital communications protocol called the Musical Instrument Digital Interface 
(MIDI), making it possible to record and store on a computer all the keyboard events 
of a performance. But because MIDI was initially implemented on keyboards with 
poor touch characteristics and that produced unconvincing approximations to piano 
sound, little in the way of serious research was possible. In the mid 1980s, however, 
Yamaha produced first a synthesizer keyboard with more realistically weighted keys, 
and then a MIDI grand piano — a standard acoustic grand piano fitted with a photo- 
electric cell system which picked up the movements of the piano's keys, hammers, 
and pedals, and translated them into MiDt signals. This was essentially a commer- 
cial implementation of the type of system devised by Shaffer (see above), but with 
the advantage that it used a standardized method of data transfer, and could there- 
fore be connected to any MIDI compatible computer. Soon after this, Yamaha came 
up with their Disklavier system (which is essentially the same as the MIDI piano, but 
with the added facility that the piano can be used to play back files, as well as to 
record them), while Bosendorfer developed their own sophisticated (and expensive) 
equivalent. 

Although the following discussion is based on MIDI data from keyboards, it is 
worth noting that other kinds of MIDI controllers exist and can be used for per- 
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formance research. There are MIDI drum pads, wind controllers, guitar, and other 
string controllers, as well as pitch-to-MIDI conversion systems designed to take 
input from a microphone and to convert the signal into MIDI information. This last 
category of device has the potential to extend the direct measurement method to the 
study of any instrument, including the human voice, and to make it possible to take 
ordinary sound recordings and convert them to MIDI. However there are serious 
limitations with existing pitch-to-MIDI conversion systems: they are not yet reliable 
or robust in performing the conversion and thus provide rather inaccurate and un- 
reliable data, they cannot deal with anything other than a single melodic line, and 
they are sensitive to any sounds that might mask, or interact with, the instrumental 
signal. Nonetheless, advances in signal processing technology and software are likely 
to make this kind of conversion a practical reality in the future. 

General introductions to MIDI can be found in a number of readily-available 
sources (e.g., Penfold 1990, Manning 1993). What follows is a simple and non-tech- 
nical description of only those features of MIDI that are relevant to performance re- 
search. MIDI is essentially a digital coding of the features of a controller (i.e., a per- 
formance instrument of some kind) into a form that can either be used to manipulate 
a sound source to which the controller is connected, or simply stored in a file on a 
computer. Since the keyboard is the only controller represented in the existing re- 
search literature, the discussion here is confined to the way in which MIDI encodes 
keyboard events. Six features of keyboard performance are captured by MIDI: 1 

1 . The identity of any key that is struck 

2. The time at which a note starts 

3. The time at which a note stops 

4. The velocity of the key press or of the piano hammer as it hits the string 
(which is directly correlated with the loudness of the sound produced) 

5. The time at which either the sostenuto or soft pedal is depressed 

6. The time at which either the sostenuto or soft pedal is released 

These data can be captured and stored by most desktop or laptop computers running 
any one of a large number of commercially available software packages — usually a 
sequencer of some kind (common ones include Vision, Cubase, and Logic, as well 
as the more sophisticated graphical programming environment called Max). Most of 
the data described above provide information that is completely self-explanatory: 
the resulting file records which notes are played, at what time, with what velocity, at 
what time they are released, and what pedaling might affect them. Two further pieces 
of data are also available, but not quite so directly: interonset interval (IOI), and ar- 
ticulation. The interonset interval of a note, usually taken to be a note's primary 
rhythmic property, is defined as the time from the start of any note to the start of the next 
note in the same part. The identification of a "part" in the definition given here can be 
tricky, but is equivalent to the rhythmic stream to which a note belongs (the rela- 
tionship between the onset of a note in one part and the onset of an adjacent note in 
the other part is usually meaningless as a rhythmic property — though it can indicate 
something interesting about synchronization) . In the case of a note-against-note tex- 
ture (such as a two-part invention), the definition of each part is clear enough. Other 
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textures may be rather more difficult to partition into streams, and may necessitate 
making pragmatic (rather than strongly theoretically founded) decisions. 

The second piece of "hidden" data is articulation. The articulation of a note on 
a keyboard instrument is determined by the relationship between the sounding and 
silent parts of a note's time span: a staccato note is one in which the sounding por- 
tion of a note's IOI is shorter than the total IOI (i.e., part of the IOI is silent), and a 
legato note is one in which the sounding portion is either the same, or greater, than 
the total IOI — in this last case overlapping with the following note in the same part. 
The information to determine a note's articulation, therefore, is found in the rela- 
tionship between the value of the IOI, and the period during which the key is actu- 
ally depressed (given by the time difference between features 3 and 2 in the list of 
six above). This value can be further influenced by pedaling: if the sostenuto pedal 
is depressed prior to, or during, the sounding part of the note, then the critical value 
which determines the offset (end) of the note is either the release of either the note 
or the pedal — whichever is the later. 

Most sequencer programs store MIDI information in a format that is appropri- 
ate for studio use but awkward for analytical purposes. In order to overcome this dif- 
ficulty, and to provide various analytical and transforming possibilities, a software 
environment called POCO has been developed that can convert standard sequencer 
files into a variety of more readable text formats, provides the means to strip out se- 
lectively all kinds of information from the original performance file, and allows the 
transfer of data directly into a variety of statistics and graphics packages. The soft- 
ware can be used via the Web, and its rationale and structure (at an earlier stage of 
development) are described in Honing (1990). 2 At the most basic level it converts 
the timing of events from the "bars, beats, and ticks" format that most sequencers 
use into seconds and milliseconds (ms), and the velocity (dynamic) data from the 
to 127 values that MIDI employs into a scale from to 1. 

Other than this simple conversion, MIDI velocity data need no transformation: 
they give a direct and immediately interpretable picture of the variations in loudness 
over time. 3 The IOI values are rather less transparent, since the duration of any note 
in a performance is a product of the duration specified by the notation combined 
with any stretching or contraction of that value introduced by the performer. It is 
these relatively small departures from what can be termed the "canonical value" of a 
note (as specified by the notation) that tend to be of interest in empirical studies of 
performance, since they are taken to indicate expressive effects (see below). For this 
reason, it is not usually the absolute value of the difference between a note's actual 
duration and its canonical value that is of interest, so much as its proportional in- 
crease or decrease. Take the case of crotchets alternating with pairs of quavers at an 
indicated tempo of 60 crotchets per minute. If played exactly as notated, each 
crotchet should last for exactly 1,000 ms and each quaver for 500 ms. Imagine some 
data from two keyboard performances of these notes with the IOIs shown in Figure 
5.1. Two things are evident from these data: neither performance is metronomically 
precise (the IOIs vary from their canonical values of 1,000 and 500 ms); and as a 
whole the first sequence is played slower than the indicated tempo (most of the IOIs 
exceed their canonical value), while the second sequence is played close to the in- 
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J J> J> J J> ^ J J> J> J JO 

1012 487 512 996 493 541 1103 533 524 1128 560 575 
996 498 458 1029 512 487 965 521 500 1002 518 496 

Figure 5,1. Sample performance data (in milliseconds) for two 
performances of a simple rhythmic pattern. 



dicated tempo (there is something like a balance of IOIs that exceed and fall short 
of their canonical values). It is not easy, however, to compare these two sets of data 
directly: Figure 5.2 shows the raw data in a graphical form, and although it is pos- 
sible to see that the data of performance 1 tend to lie above those of performance 2 
(i.e., the tempo of performance 1 is somewhat slower), the zigzag profile makes it 
hard to see what is going on. There are various ways to transform the data to bring 
out specific features and make comparisons between data easier: a common ap- 
proach is to divide the duration of each note by a number representing the note's 
written value, usually expressed as a multiple of some underlying metrical value. For 
the sequence of crotchets and quavers being considered here, crotchets are repre- 
sented by the value 2 and quavers by the value 1 , and the result of dividing the two 
sets of data by the appropriate sequence of ones and twos is shown in Figure 5.3. 

It is now a great deal more obvious that performance 1 is slower than perform- 
ance 2, and furthermore it is also clear that performance 1 gets slower as it proceeds 
(it rises in the graph) while performance 2 stays more or less stable in terms of over- 
all tempo. Figure 5.3 is essentially a tempo map, showing the momentary tempo of 
each note for the two performances — except that what is shown is actually the in- 
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Figure 5.2. Raw timing data for two performances of a simple rhythmic pattern. 
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Figure 5.3. Normalized timing data for performances of a simple rhythmic pattern. 

verse of tempo (sometimes called "normalized 101"). Because tempo is a more fa- 
miliar concept, it is common for a further transformation (inversion of the data) 4 to 
be applied, with the result shown in Figure 5.4. 

At least one more feature can be extracted from even such simple and small- 
scale data — a difference in the treatment of the tempo of quavers in the two per- 
formances. In performance 1 the second of each pair of quavers (the quavers are data 
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Figure 5.4. Timing data represented in terms of momentary tempo (in quaver beats 
per minute). 
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points 2 and 3, 5 and 6, 8 and 9 and 11 and 12) is played consistently slower than 
the first, while in performance 2 it is the other way around, a result that could be 
used as evidence either for a difference in performance style between the two sets of 
data, or for a distinction between cognitive strategies in the two performers who 
were the source of the data. 



Expression in Performance: Its Definition and Analysis 

A central preoccupation in research on performance has been the nature and func- 
tion of expression. A crucial question is how expression should be defined and char- 
acterized, and whether (and how) it might be measured. From some of the earliest 
work on performance [Seashore 1967 (1938)] has come the idea that expression 
should be defined as departures from some neutral norm, this norm itself being in- 
expressive as far as performance is concerned. For Seashore, and many others who 
have followed a Western, score-based tradition, this has translated into defining ex- 
pression as departures from the values (principally rhythmic and dynamic) specified 
in the score. The rationale for this approach is that what makes a performance ex- 
pressive is what the performer brings to the piece beyond what the composer has 
specified in the score. Thus expression consists of systematic departures from the ex- 
plicit notation of the score — whether these departures are deliberate and conscious 
on the part of the performer or not (for further discussion, see Clarke 1995). 

A problem with this approach is that it regards the score as "the piece" in a kind 
of disembodied, ahistorical fashion, apparently divorced from any of the cultural as- 
sumptions about how the notation might be understood and interpreted. For ex- 
ample, it was an assumed convention of the French Baroque to play passages of 
equally notated quavers in an alternating long-short pattern (notes inegales), and to 
play dotted notes with variable amounts of over- and underdotting. An analysis which 
concluded that the performer consistently applied an expressive emphasis to the first 
of a pair of quavers, or to the dotted note in dotted rhythms, might be thought to have 
missed the point since these "departures" from the notation are implicit within the no- 
tation itself, and thus do not constitute the performer's contribution. However, exactly 
the same might be said of the overwhelming tendency for performers to slow up to- 
ward the end of phrases or sections, an equally pervasive cultural expectation within 
the Classical and Romantic repertoire — but this phenomenon has become probably 
the most intensively studied expressive property of performance. The problem comes 
down to whether expression should be regarded as the specific contribution of an in- 
dividual performer, or in more social terms — as the combination of widely shared 
cultural norms with the particular input of the individual performer. If it is the for- 
mer, then where one places the boundary between cultural norms and individual ex- 
pression becomes a critical consideration; if the latter, then a more relaxed attitude to 
the source of the phenomenon (cultural convention or individual intention) is pos- 
sible. In practice, few researchers have devoted much attention to these issues. 

A more concrete methodological issue is how to distinguish between random or 
accidental variations in timing, dynamics, or articulation, and properties that might 
legitimately be considered expressive. The problem of distinguishing deliberate (hence 
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expressive) features from mistakes is problematic: the most widespread solution 
adopted in the literature is to make use of the principle of reproducibility, and to pay 
attention only to those features that emerge from the average of a number of repeated 
performances, rather than individual data points. While this approach is consistent 
with the standard approach to empirical data used in the behavioral sciences, it runs 
counter to a fundamental principle in musical performance — the idea that perform- 
ance is a recreative, rather than reproductive, act: each performance is a specific re- 
alization of a piece of music, and there is no reason why any two such realizations 
should converge toward identity. Variation between notionally "identical" renderings 
of the same piece cannot therefore be regarded as random, and the average of a set 
of performances, by collapsing together distinct approaches to the piece, may have 
little value. 

An alternative is to use internal consistency within a single performance, and to 
analyze the data guided by principles of musical coherence to distinguish expressive 
features from errors. There are still problems here: first, internal consistency within 
a single performance is subject to the same "uniqueness" argument as applies to sep- 
arate performances (there may be good reasons to play the same, or a similar, pas- 
sage occurring in two places with deliberately different expressive features); and sec- 
ond, musical analysis is not an exact science and cannot be relied upon to provide 
an unequivocal basis for distinguishing between errors and intentions. The approach 
actually adopted in the published literature has often depended on the nature of the 
task and the quality of the data. When the performance data come from relatively 
simple musical materials, collected under controlled conditions with performers 
who are not of concert standard, repeated performances and groups of subjects have 
been used, with standard statistical methods (e.g., Clarke 1993). By contrast, when 
the data are from concert pianists performing concert repertoire, there is often no 
opportunity to collect repeat performances, and in these cases the authors may de- 
pend on the expertise of the performers and the consequent standard of the data to 
justify the abandonment of standard statistical methods (e.g., Palmer 1996). A fur- 
ther possibility, demonstrated in work by Widmer (2002), is to apply data mining 
methods to large bodies of data taken from skilled performers in order to extract 
general principles from unique performances that all belong to a narrowly focused 
performance style and repertoire. 

Example: Chopin's Prelude, Op. 28 No. 4, in E Minor 

The data shown in Figure 5.5 come from an analysis of two performances of Cho- 
pin's E Minor Prelude by a concert pianist playing on a MIDI grand piano (Clarke 
1995). The performer was a professional pianist who played on a Yamaha MIDI 
grand piano in one of the teaching rooms of the Music Department at City Univer- 
sity, London. The only people present were the pianist himself and three researchers 
involved in the study. The performer was originally asked to provide three perform- 
ances of the prelude in succession, but for a variety of reasons (not least his own in- 
terest in the study) he ended up playing the piece six times in the space of about an 
hour. There were no instructions as to how he should play it on any occasion, and 
between performances he spontaneously provided an idiosyncratic commentary on 
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his playing. The performances therefore took place in a rather unusual — even artifi- 
cial — situation, but one with which the performer seemed entirely comfortable. The 
data were recorded on a desktop Macintosh computer using a commercial sequencer 
(Vision), and were processed using POCO and a statistics program (Statview) in the 
ways that have already been described. 

The piece consists of a rhythmically differentiated right hand melodic line ac- 
companied for the most part by a constant stream of three-note left hand quaver 
chords, 5 and this immediately raises questions about how the data should be exam- 
ined and represented. The data from the right hand are an obvious target for analy- 
sis, since the right hand (particularly in this kind of texture) is conventionally re- 
garded as the primary carrier of expression, and in this piece consists of a stream of 
single notes. The disadvantage is that it provides rather sparse and intermittent data 
(many bars contain only two notes — a dotted minim and a crotchet) compared with 
the regular quavers of the left hand. The problem with the left hand is that it con- 
sists of three- and four-note chords, each note of which has its own dynamic and 
timing value. One solution is to simplify the left hand data by regarding each chord 
as a unit, resulting in a single timing and dynamic measure for each chord. There are 
various ways in which this might be done, but for the purposes of this example the 
dynamic value for each chord is represented by the average dynamic value of the 
notes played in the chord, and the IOI for the chord is calculated as the time from 
the first note of the chord to be struck (whichever note that is) to the first note of the 
subsequent chord (whether that note is in the same voice or not). The rationale for 
this is that the most fundamental rhythmic property of any event sequence is its 
attack-point pattern, and that the primary dynamic attribute is a property of the 
whole chord regarded as a fused entity. Figure 5.5 thus shows sequences of single 
data points for the tempo and dynamic level of each chord in the left hand, for each 
of two performances. 

The large-scale, two-part shape of the performances can be seen in the tempo 
and particularly the dynamic data; the bar-by-bar pattern of expressive timing can 
be plainly seen at the start; and various differences between the expressive shaping 
of the two performances can also be seen. In the original publication (Clarke 1995), 
these differences were used to argue for two different interpretations of the piece (ar- 
ticulated in these two performances) relating to an ambivalence in its underlying for- 
mal structure. The purpose of the example here has been simply to illustrate the 
kinds of data that can be gathered and some of the questions that they can be used 
to address. 



The Limitations of MIDI 

A number of sources of information available from MIDI were not considered in the 
study described above (pedaling, articulation, synchronization within chords and be- 
tween parts) , while other features of piano performance are impossible to study using 
MIDI: because MIDI data are derived from the mechanism of the keyboard action 
and not the sound, anything that relates to acoustical properties of the instrument is 
excluded from the kind of analysis shown above. This includes a whole variety of 
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Figure 5.5. Tempo and dynamic data from two expert performances of the Chopin 
Prelude in E minor, Op. 28 no. 4. 



timbral properties, which are the consequence of complex interactions between si- 
multaneously and consecutively sounding strings and the whole body of the instru- 
ment, as well as of pedaling and the specific actions of the hammers and dampers. 
These are, of course, properties over which a skilled performer exercises consider- 
able control, and may even be regarded as the primary expressive parameters in cer- 
tain kinds of repertoire: MIDI data from a performance of Steve Reich's Piano Phase, 
for example, would entirely miss the resonance and "streaming" effects that take 
place between the two pianos in a performance of the piece, and are arguably its 
most important features. 

Then there are the physical and social dimensions of the performance: neither 
the interactions with any listeners who may be present, nor a whole variety of visual 
data (most obviously the movements, demeanor, and facial expressions of the per- 
former) can be considered. Some of these have been studied separately (see below), 
but there is little work that tries to bring these different sources of information to- 
gether to provide a more multidimensional view of performance (though see Clarke 
and Davidson 1998, for a preliminary and partial attempt, and Juslin, Friberg, and 
Bresin 2001-2002 for a rather different approach). 

Finally, there is one simple but rather practical limitation to the use of MIDI: the 
performances have to be new performances given by living performers who are will- 
ing to give up their time to give a performance on a MIDI instrument of some sort. 
This immediately excludes the huge resource and historical perspective that com- 
mercial recordings can provide, and also limits research to the study of those per- 
formers who are willing to take part in these "research" performances — often re- 
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corded in unnatural circumstances (usually without an audience or any sense of oc- 
casion) and on instruments that may not be of concert standard. For this reason, 
methods have been developed for obtaining performance data from existing record- 
ings, as the next section describes. 



Performance Data from Recordings 

The attractions of working with recordings are self-evident: not only is there is no 
other way to study Cortot or Michelangeli, but you can also deal with instruments 
other than the piano, and you are working directly on the real artifacts of musical 
culture. Moreover an enormous heritage of recordings, now going back over a cen- 
tury, is becoming easily accessible to scholars, partly through CD reissues of histor- 
ical recordings, but also through the efforts of major collections such as the British 
Library Sound Archive (London). 

There are, however, significant drawbacks. One is that the data are much more 
difficult to extract than MIDI data: retrieving useful information is at best laborious, 
and at worst impossible. The other is that historical recordings present problems of 
interpretation comparable to those that apply to other forms of documentary evi- 
dence, but less well recognized owing to the comparative novelty of this area of re- 
search. One such problem is the fact that recording speeds were not standardized in 
early recordings (and were in any case subject to mechanical variation). Early re- 
cording techniques were also highly intrusive: in the mechanical era (up to 1925) 
players had to crowd around a large horn, and this not only drastically affected the 
balance between instruments but also disrupted the social environment of normal 
performance. (Such recordings may be no more true to life than the carefully posed 
studio portraits of Victorian families, with their suspiciously well-scrubbed and im- 
maculately dressed children.) Ironically, when the technology to make high-fidelity 
recordings became available, with Decca's full-frequency-range recordings (ffrr) 
from 1945 and the adoption of magnetic tape from 1950, it was rapidly used to cre- 
ate recordings of performances that had never been, by means of editing techniques: 
"it was not uncommon," writes Day (2000: 26), "for the master-tape for an LP, last- 
ing perhaps fifty or fifty-five minutes, to be the result of 150 splices." This in no way 
invalidates the recording as an object for musicological study: it may not be a direct 
representation of an actual performance, but at the same time such recordings rep- 
resent one of the principal forms in which music was made available in the twenti- 
eth century, and was consumed by an ever-increasing public. The point to be made 
is simply that there is nothing straightforward or transparent about recordings as 
historical sources. 

There have been two approaches to obtaining performance data from existing 
sound recordings: one has been to try to derive something like the same level of de- 
tail that MIDI recordings provide, while the other has been to focus on a slightly 
coarser-grained approach, usually limited to timing. The first of these two approaches 
has been used most extensively by Repp in a stream of publications investigating in 
some cases very large repertoires of recorded performances (e.g., Repp 1998, 1999). 
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His method has been to digitize the music and then use a digital waveform editor 
(SoundEditl6) to display the waveform at a suitable level of temporal resolution. A 
cursor is positioned at clearly recognizable note onsets and used to label those 
points, using auditory feedback from playing the waveform to decide on the onsets 
of any difficult notes. These labels identify the times of note onsets, and are saved to 
a file for subsequent analysis. The method is primarily designed to retrieve lOIs, but 
has also been used to measure asynchronies among nominally simultaneous note 
onsets. A meaningful analysis of dynamic levels is even more difficult to derive from 
sound recordings than is timing, because of the problem of estimating the effective 
dynamic level of simultaneously occurring events (multinote chords). Repp has, 
however, undertaken such a study (Repp 1999) using software (Signalyze) that pro- 
vides the amplitude envelope of the signal. His approach is to take the peak ampli- 
tude of the envelope after the onset of a note (or chord) as the index of its dynamic 
level, and to make no attempt to separate out different streams (voices) within the 
texture; 6 clearly the appropriateness of this approach depends on the texture of the 
music, and it would be of dubious value in a polyphonic context. Repp's method can 
in principle be used with any instrument or combination of instruments, although 
it becomes increasingly difficult to use as the texture becomes more complex, or 
with instruments that have indistinct note onsets (e.g., the voice). It should also be 
noted that it is laborious and time-consuming, involving a number of operations to 
extract the data for each single event. 

A second approach has been to focus on timing at a variety of levels, and to ex- 
tract the timing information by tapping on some device (usually a computer key- 
board, or a MIDI instrument connected to a computer) in synchrony with the music 
(e.g., Cook 1995, Bowen 1996, and for a study that also considers dynamic shaping 
see Martin 2002). This "tap along" method has the merits of simplicity and economy, 
but clearly depends on the precision with which the researcher can synchronize with 
the music. For relatively large musical units (sections, phrases, sub-phrases, and 
even bars) this is not a problem, since the synchronization error will be a small pro- 
portion of the measured unit. However, for smaller units (beats or single notes) it 
can become a serious problem, and the method has generally not been attempted 
below the level of the beat. One reassuring point is that the synchronization error is 
not cumulative, since it is monitored and adjusted at every unit onset, and a check 
on the reliability of the method can be made either by making multiple passes at the 
material, or by enlisting independent judges. 7 

Example: Beethoven's Ninth Symphony 

In a chapter broadly concerned with the relationship between analysis and perform- 
ance, Cook (1995) explores Furtwangler's conducted performances of Beethoven's 
Ninth Symphony in the context of Schenker's analysis of the music. The study fo- 
cuses on tempo in performance, and uses the "tap along" method to extract bar-by- 
bar tempo information for the first movement of two live performances available on 
CD (the use of live recordings helped to minimize the problems resulting from edit- 
ing, which are particularly acute in studies of performance timing). The technique 
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involves playing the CD in the CD-ROM drive of a computer, and tapping the space 
bar of the computer keyboard in synchrony with the onset of each bar. The computer 
then stores the times for the inter-tap intervals in a file, which can be shown as a bar- 
by-bar tempo chart. Cook (1995: 114) observes: 

Because it is sometimes difficult to decide exactly where the downbeat falls, 
and because of factors of motor control, this process is not entirely reliable. 
In repeated tests based on the same passage of music, my responses varied 
by an average of 60 milliseconds, which is around 3 percent of the total du- 
ration of each bar; deviations of 100 milliseconds (5 percent) were quite fre- 
quent. For this reason it would be foolish to make too much of small tran- 
sitions which appear on the bar-to-bar level; the data are not sufficiently 
accurate. But the discrepancies are not cumulative, and this means that infer- 
ences regarding the broad shaping of tempo (which is what [the] chapter is 
about) are robust. 

The study goes on to use the data to refute the assertion that Furtwangler's tempo 
fluctuations were arbitrary and uncontrolled, and to show that his performances 
correspond to Schenker's analytical view of the piece. Cook (1995: 109) points out 
that, simply from listening to the performances, it appears that 

Furtwangler shapes his phrases, balances his instrumentation, or articulates 
formal junctures in ways which do match what Schenker says, or that at least 
seem to belong within the same language of performance that Schenker is 
talking. But judgments of this sort are inevitably vague and impressionistic, 
especially in view of the limited sound quality and control over balance that 
was possible in live recordings around 1950. And close listening, by itself, is 
even more inadequate when it comes to large-scale tempo modifications; 
waves of accelerando and decelerando are clearly audible, but it can be diffi- 
cult to disentangle them from dynamics, articulation, tonal quality, and all the 
other dimensions that contribute to the energy or tension level of the music. 
Unlike these other dimensions, however, tempo relationships can easily be 
measured in an empirical and reasonably accurate manner. 

As with the previous example, the intention here is not to raise the issues that are 
the concern of the original study, but simply to point out that the method is viable 
and useful, and permits a kind of discussion that would be hard or impossible to sus- 
tain without the empirical data that the method provides. One of Cook's main argu- 
ments in the original study is that Furtwangler makes use of relatively large-scale 
arch-shaped tempo profiles that allow him to use tempo for both structural and rhe- 
torical purposes in performance. Without the tempo graphs which the "tap along" 
method provides, it would be impossible — or at the very least verbally cumbersome 
and far more open to dispute — to demonstrate the particular characteristics of per- 
formance that are used to argue both for the particular audible characteristics of 
Furtwangler's performances, and for the existence of a specific history and ideology 
of conducting that Cook traces back to Wagner and Bayreuth. 
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Evaluative and Qualitative Methods 

Measuring the timing, dynamic, and even timbral properties of performances has 
produced a wealth of previously unknown information but, as has already been 
pointed out, it gives only a very partial view of what happens in performance. In par- 
ticular, such an approach entirely misses the social dimension of performance (the 
interactions between performers, and between performers and others, discussed in 
Davidson, chapter 4, this volume). While it is possible to quantify some of these 
characteristics, there is much to be gained from adopting a qualitative approach. 8 A 
method which owes a lot to work in psychotherapy is to get performers to speak 
about their own performances, and then to analyze both what they say and what 
they do. As a research method this "talking analysis" is aimed at discovering the in- 
tentions, motivations, and evaluations of one or more performers in relation to their 
own (or another's) performance. The data for this kind of approach are usually of 
two kinds: first, a sound recording of one or more performances, and second, a 
sound recording of the commentary by one or more of the original performers, or an- 
other commentator. The commentary is often made by a person who listens to the 
original sound recording of the performance, stopping the recording as often as he 
or she likes to make whatever comments are appropriate, and possibly doing so on 
more than one occasion. In this way a detailed account can be built up outside the 
pressure of "real-time" performance. The subsequent treatment of the resulting com- 
mentary can take many forms (see Davidson, chapter 4, this volume; Robson 1993), 
usually concerned in one way or another with a content analysis. 

An example of this approach is the work of Sansom (1997), who studied mu- 
sical and personal interactions in free improvisation. The study involved a number 
of pairs of performers who improvised together, the participants being experienced 
improvisers who had played music with one another previously. Sansom recorded 
each improvising duo, and then on subsequent occasions asked each of the two 
players individually to listen to the recording and to stop it and comment on any- 
thing that occurred to them about what was happening or what they had been 
thinking about at the time. These commentaries were also recorded and then sub- 
jected to a content analysis. The following (Sansom 1997, ii: 9) is an extract from 
the comments provided by one of the members of a guitar and saxophone duo — 
two players who are very experienced improvisers and have frequently played to- 
gether in the past: 

Sort of little funk chords come out there for some reason, it took me 
quite a lot by surprise. I quite like the sound, I reiterated them a few times . . . 
I never usually do . . . techniques so recognizable as that, very rarely. . . . 
Another thing from that, I don't really play on the beat, I play in the bits in 
between. . . . And often on playback it sounds a bit weird 'cos it's just like a 
little time gap between you — the same rhythm but not together on it. I was 
thinking about it there because the melodic thing isn't important, we were 
both skidding about anywhere. We both decided to play around with the 
rhythm and it sounded a bit messy but er . . . 
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I really enjoyed that bit . . . sort of juxtaposing really brutish behaviour 
with really small, er, considerate little things ... I can't remember what comes 
next, I hope it's something very quiet — we'll see. . . . 

Right. Mick didn't feel like stopping at the same time as I did . . . er . . . 
obviously a certain energy that I was feeling diminished. I thought it was a 
good time to stop. At a time when Mick was still on a high, so I started to lis- 
ten to see what he would do. How far he would go on without me comment- 
ing. I'd a fair idea he would go on quite a distance ... I'd no idea what he'd 
make it into. . . . 

The analysis of this and similar passages provided by Sansom's participants focuses 
on both musical and interpersonal processes. As the extract demonstrates, the play- 
ers' transcripts constitute a complex mix of recollection, current comments, aware- 
ness of musical materials, and awareness of interpersonal dynamics. Sansom's con- 
tent analysis categorizes the remarks into themes that then form the basis for a 
discussion of creativity and interaction in ensemble improvisation. The advantage of 
a qualitative approach such as this is its potential to investigate an enormous diver- 
sity of phenomena that the quantitative methods discussed earlier are incapable of 
addressing, including issues of motivation, intention, reaction, and evaluation. 

A difficulty, however, is the way in which empirical method is combined with 
interpretation. Every empirical method has an interpretative component, but in 
most quantitative methods the premises and boundaries of interpretation are usu- 
ally fairly well recognized. Qualitative methods, however, have been accepted for a 
very much shorter time, and have not yet acquired the stability (it may simply be fa- 
miliarity) of their quantitative counterparts. When a quantitative test gives a certain 
outcome, and subsequent discussion then interprets that result and places it in the 
context of other work, it is fairly clear (if one accepts the premises of the method) 
where the reporting of results stops and interpretation starts. This is much less clearly 
the case with a qualitative method such as that used by Sansom, where the assign- 
ment of verbal data to content categories is itself a strongly interpretative process. 
The objection leveled at qualitative research of this kind is that it is too speculative — 
that it sets itself up as empirical, and then goes about its business in a manner that 
looks more like literary criticism. This objection partly reflects the fact that the inter- 
pretative assumptions of most quantitative methods have simply become so deeply 
embedded as to be invisible, but it remains the case that qualitative methods have 
yet to attain the systematic and explicit character of empiricism in the eyes of many. 
Robson (1993) and Smith, Harre, and van Langenhove (1995) provide fuller accounts 
of this debate. 



Analyzing Visual Components of Performance 

Performance is not only a sonic event, and the visual component of performance of- 
fers a rich domain for empirical study. In a number of publications, Davidson (1993, 
1994, 1995) showed how observers could make accurate judgments of the expres- 
sive properties of performances on the basis of video data. The participants in her 



Empirical Methods in the Study of Performance 93 

studies watched and listened to a number of different types of material: normal video 
with or without sound, video with sound from a different performance overdubbed, 
and what are called "point-light displays." These last are video recordings made with 
the performer wearing small spots or strips of reflective material on the major joints 
(wrist, elbow, shoulder, hip, knee) and head in strong illumination, and then played 
back on a monitor with the contrast turned to maximum. The resulting image con- 
sists simply of spots of light, with the rest of the body and instrument invisible. Ear- 
lier work in psychology (e.g., Runeson and Frykholm 1983) showed that viewers are 
able to perceive significant attributes (such as gender, physical effort, and intention) 
from a point-light display, suggesting that this information is conveyed by the dy- 
namics of the moving display. Davidson showed similarly that viewers were able to 
distinguish accurately between expressionless, normal, and exaggerated perform- 
ances by a variety of instrumentalists on the basis of as little as two seconds of point- 
light video without sound. She also found that, when sound was present, nonmusi- 
cians tended to be less influenced by it than by the video image: for example, when 
a point-light video of an exaggerated performance was overdubbed with the sound 
of the same person playing in an expressionless manner, nonmusicians tended to 
rate the performance toward the exaggerated end of the scale — whereas their ratings 
were toward the expressionless end of the scale when the image was from the ex- 
pressionless performance and the sound from the exaggerated condition. 

Other studies by Davidson (e.g., Davidson 1994, 1995) used ordinary video 
images and were designed to investigate whether a performer made use of a fixed 
repertoire of expressive gestures in performance, and whether viewers could iden- 
tify their meaning. The technique involved making video recordings, from a fixed 
position, of a performer (in this case a pianist) who played a mixture of repertoire. 
The video recordings were shown to viewers who were asked to identify specific 
types of movement gesture, and give some indication of what they thought these 
gestures "meant" in terms of their relationship with the music. A variant of this 
method was a study (Clarke and Davidson, 1998) that combined the type of MIDI 
data analysis described earlier in this chapter with a movement analysis taken from 
a video recording. In this case, the movement analysis involved the detailed track- 
ing of a pianist's head movements using specialized video analysis equipment. The 
resulting continuous plot of head position was then compared with MIDI data from 
the piano in an analysis of the relationship between expressive movement, and ex- 
pressive timing and dynamics. 

At a still more detailed level, Sloboda, Clarke, Parncutt, and Raekallio (1998) 
used video to record the hand movements of pianists as part of a project on keyboard 
fingering. A camera was mounted directly above the keyboard of a MIDI instrument, 
with powerful sideways illumination to highlight (with light and shadow) which fin- 
ger depressed which key. The video recording was then used to match the finger that 
depressed each key with the corresponding note in the MIDI output. There is also 
the potential to use rather more sophisticated and automated methods in analyzing 
movement in performance. A preliminary study by Winold, Thelen, and Ulrich 
(1994) investigated the bow-arm movements of cellists, using a method which em- 
ployed an optoelectronic motion analysis system, in which the position in space of 
various specified points on the player and instrument (just the bow in this case) were 
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continuously tracked by computer. The technique has the potential to provide data 
on the continuous movements of instrumentalists and conductors during perform- 
ance, and offers the prospect of a much more diverse and detailed study of move- 
ment in performance than has so far been undertaken. 

Example: The Detection of Expression in Point-Light Displays. 

Having discovered that viewers can pick up the expressive intentions of performers 
from point-light displays, Davidson (1994) studied which parts, or combinations of 
parts, of the body are particularly "informative" in making expressive judgments. The 
design of this study was that a single performer (a pianist) played a short extract of 
music in three different performance manners (deadpan, expressive, exaggerated), 
and with seven different arrangements of point-light patches attached to his body: 

1. All patches: head, right shoulder, right elbow, and both wrists 

2. All patches except elbow 

3. Head and wrists only 

4. Wrists only 

5. Head only 

6. Elbow only 

7. Shoulder only 

This is clearly not an exhaustive list of all the possible combinations, but it rep- 
resents the maximum that could be reasonably investigated given that the pianist 
had to play the music in each of these point-light arrangements, and in each of the 
performance manners, resulting in 21 (3 X 7) performances. Video recordings were 
made of all 21 performances, and were shown (without sound) in different random 
orders to a total of 15 observers (graduate music students) who were instructed to 
rate each performance on a seven-point scale of expressivity (from 1 = inexpressive 
to 7 = highly expressive). 

The results showed that the observers were reliably able to identify the three dif- 
ferent manners of performance; that the different point-light arrangements resulted 
in significantly different patterns of scores; and that the scores for the three versions 
were different for each point-light arrangement. From this somewhat complex pat- 
tern of findings, one simple discovery is that the head and wrists seem to convey 
most of the expressive information: the "head and wrists only" arrangement gave re- 
sults that were essentially identical to those for the "all patches" condition, and even 
"head only" gave a pattern of scores that was close to that for "all patches". "Wrists 
alone," however, seemed to give very misleading information, with all three per- 
formance manners eliciting rather similar (and inappropriately high) expressivity 
ratings. Some of the observers' comments indicated that it was the level of activity in 
the wrists (largely a consequence of the movements necessary simply to play the 
piece) that led to consistently high ratings. As one subject put it (Davidson 1994: 
297-298) "I suspect I'm rating this wrist performance as highly expressive just be- 
cause there is plenty of action. When wrists are combined with other areas of the 
body, I look to the whole, and the emphasis on fast motion is reduced for I see an 
elegant arch of the back and a delicate hand lift and it is from the combined infor- 
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mation that I am able to recognize the intention of the piece." The study, then, shows 
how relatively simple techniques can be used to investigate how different parts of 
the body are involved in communicating expression, and what relationship there 
might be between quantitative and more qualitative factors in this process. Since this 
empirical work was done, the available technology has developed enormously, and 
rather than asking the performer to play 21 times, with different point-light arrange- 
ments for each, it would now be possible simply to record three performances (dead- 
pan, expressive, and exaggerated), each with the full array of point lights, and then 
use computer editing facilities to edit out individual point-lights, allowing all pos- 
sible combinations to be investigated, and ensuring that each performance manner, 
of which each edit was a variant, really was constant. 



Performance Models and Algorithms 

The availability of powerful desktop computers and associated software has made 
new empirical methods possible, and has also facilitated the construction of models 
of musical performance. Artificial models of performance have played a significant 
part in the development of empirical approaches and have often been closely con- 
nected with empirical work. When research in the late 1970s demonstrated that mu- 
sical performance possessed a variety of systematic features, it became an attractive 
proposition to see whether these features could be embodied in, and simulated by, 
an artificial model of some kind: Sundberg, Fryden, and Askenfelt (1983), Clynes 
(1983), and Todd (1985) represent some of the earlier examples of this kind of 
work. While the model developed by Sundberg and his coworkers was based on 
their own intuitions, and makes no claim to be derived from explicit empirical evi- 
dence, Todd's various models (Todd 1985, 1989, 1992) come from analyses of per- 
formance data, and have in turn motivated and directed subsequent empirical stud- 
ies using one or other model as their starting point (e.g. , Windsor and Clarke 1997). 

The function and purpose of such models have been misunderstood by some — 
at times because of inappropriate claims made by their authors. As with the major- 
ity of work in artificial intelligence, the primary purpose of a model is as an ex- 
planatory device: the model embodies certain psychological principles, and the aim 
is to draw conclusions from its successes and failures about the principles on which 
it is based. The outputs are of comparatively little significance in themselves: a model 
of expressive performance, for example, has little value as a source of expressive per- 
formances, but has the potential to identify important issues in the theory on which 
it is based. 

The purpose of this brief discussion is neither to explain in detail how any of 
the various models work, nor to chart their successes and failures, but simply to note 
some general characteristics of the approaches that they embody, and their implica- 
tions for the empirical study of performance. The models in this field can be divided 
into those that approach the subject by means of a single explanatory principle (e.g. , 
Todd 1985) and those which adopt a "multi-rule" approach (e.g., Sundberg, Friberg, 
and Fryden 1991; Juslin, Friberg, and Bresin 2001-2002). Todd (1985, 1989, 1992) 
has proposed a number of variants of a model of expressive timing and dynamics 
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based on sensitivity to phrase structure, and implemented by means of an algorithm 
(a rule system) that uses tempo and dynamic curves based on pendular motion. The 
algorithm takes the pitches and rhythms of the score as its input, along with a hier- 
archical phrase structure analysis of the music, supplied by the user. 9 A tempo and 
dynamic curve is applied to each phrase unit at every level of structure, the result 
being a pattern of tempo and dynamics that directly reflects the hierarchical impor- 
tance of every phrase boundary; the user can direct the model to work within more 
or less extreme limits of tempo and dynamic variation, and to pay more or less at- 
tention to different levels of the phrase structure. The output of the model is a file 
specifying the precise temporal and dynamic value (i.e., the "expressive" value) of 
each note from the original input file. 

The model developed by Sundberg and his coworkers (see, e.g., Sundberg, 
Friberg, and Fryden 1991) is rather different in conception and operation. Rather 
than using a single principle that is responsive to hierarchical structure, it consists 
of a collection of separate and much more specific rules, some of which operate 
globally and have no sensitivity to structure, while others are responsive to local fea- 
tures of the music. An example of the former is the rule "the higher, the louder" 
which increases the dynamic level of a note in proportion to its pitch height; an ex- 
ample of the latter is the "melodic charge emphasis" rule, which increases the dura- 
tion, dynamic, and degree of vibrato applied to a note according to the harmonic re- 
moteness of the note from its local tonic. 

Finally, Juslin, Friberg and Bresin (2001-2002) have proposed a model which 
combines four elements corresponding to the acronym GERM: a set of generative rules 
(G) that relate musical structure to performance expression; principles of emotional 
expression (E) in performance; a component that introduces deliberate random vari- 
ability (R) in timing, intended to simulate the uncontrolled low-level variations that 
characterize the human motor system; and a component which is intended to cap- 
ture the motion character (M) of expressive performance. The generative rules in the 
first element are essentially those developed by Sundberg and his coworkers, but the 
addition of the three other elements is an attempt to develop a model that is more 
inclusive than previous attempts, and that accounts for the influence of features 
other than structure alone in the shaping of performance expression. An initial em- 
pirical evaluation of the model, reported in the paper, suggests that the E element is 
actually the most powerful, but that the idea of four separate but interacting com- 
ponents is indeed a powerful way to model performance expression, leaving the au- 
thors to conclude that "by considering all four components of the GERM model to- 
gether, we will be better able to understand the variability usually found in music 
performance data" (Juslin, Friberg and Bresin 2001-2002: 109). 

From an empirical point of view, the value of such models lies in the definite 
predictions that they make, against which one can investigate the phenomena of real 
performance. They may represent a rather more interesting baseline for the mea- 
surement of expressive deviation than the idealized flat line of the score — a baseline 
that, after all, has no psychological reality (repeated studies have shown that it is im- 
possible for a human performer to produce a completely deadpan, or expressionless, 
performance). In a similar manner, Parncutt, Sloboda, Clarke, Raekallio, and Desain 
(1997) offer an ergonomic model of piano fingering not as a plausible "solution" to 
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the problem of how pianists decide on fingerings, but as a baseline of physical con- 
straints against which to assess actual fingering choices - which arguably are influ- 
enced by much more than physical convenience. There has been relatively little em- 
pirical investigation of the available models of expression, with the possible exception 
of Clynes's work which, because of its controversial claims, has attracted rather more 
attention and investigation than others (e.g., Repp 1989, 1990b, Thompson 1989, 
Clynes 1990, 1995). Such empirical work as there is has generally aimed to test the 
models, assessing the success with which they simulate real performance. An alter- 
native approach, however, is use the model as a tool, as a way of highlighting what 
it is that makes human performances interesting. This is the perspective taken by my 
final example. 



Example: Schubert's Impromptu in G Flat Major 

Windsor and Clarke (1997) use Todd's (1992) model in conjunction with an expert 
performance of the first 16 bars of Schubert's G flat major Impromptu in order to in- 
vestigate different components of expressive timing and dynamics. After recording 
MIDI data (tempo and dynamics) from an expert performance on a Disklavier piano, 
Windsor and Clarke attempted to simulate the performance using Todd's model, 
with the only inputs being the pitches and rhythms of the score, and a phrase struc- 
ture analysis. Different simulation attempts varied only in the "weight" given to each 
level of the phrase structure, on the assumption that if the model has any validity it 
should be able to approximate the expert performance if an appropriate pattern of 
weightings could be found. The output from each attempt was compared with the 
data from the expert performance using correlation analysis, trying with successive 
simulations to arrive at the best fit (i.e., the highest correlation) between the simu- 
lation and the data. The study found that the best fit for the timing data required 
weighting the lowest levels of the phrase structure (half-bars and bars) more heavily 
than the middle and high levels; this weighting, however, resulted in a rather poor 
fit for the dynamic data, which were modeled much better by weighting the middle 
and high levels more heavily than the lower ones. This has implications both for the 
principles and design of the model, and for an interpretation of what the performer 
might be doing. In terms of the model, it demonstrates that the assumption that tim- 
ing and dynamics are rigidly and directly linked is too simplistic; and in terms of 
performance expression, it suggests that performers may be using timing and dy- 
namics in different ways to project (or respond to) different aspects of structure. In 
this performance, timing seemed to operate in response to more local structures, and 
dynamics to middle-level structures. 

The simulations, although approximating to the expert performance, captured 
its features in only a rather partial manner — and the study was equally concerned 
with those aspects of the performance that the model did not capture (it included a 
discussion of how these discrepancies might be explained). The advantage of the 
model, then, is that it highlights those features of the data that are not systematic, or 
that are systematic according to some different principle. Those same features might 
have emerged from a comparison with the "flat line" of the score, but they are thrown 
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into much sharper relief when the data are analyzed against the background of an 
attempt to model the generic features of performance by systematic means. The 
paper concludes (Windsor and Clarke 1997: 149): 

In this paper, we have demonstrated the way in which a model of perform- 
ance may act as a tool in the analysis of performance data. Although the 
model fails to account for every aspect of a human performance, and could 
possibly be revised in the light of the data collected here, these failures are 
seen as positive because they highlight different aspects of musical expression. 
The model provides a baseline that is derived from a strong theoretical posi- 
tion, against which other expressive strategies can be assessed. In its clear and 
unambiguous modeling of continuous expressive strategies, the model allows 
one to factor out noncontinuous strategies in a manner not possible when a 
performance is analyzed in relation to an isochronous score. 



Conclusions and Prospects 

Since the late 1970s, when the "contemporary" period of empirical work on per- 
formance began, there have been a number of significant achievements. A much more 
systematic description of a whole variety of phenomena that were previously only 
known in broad outline is now available, and as a result there is more extensive recog- 
nition of the importance of studying music as performance rather than in the score- 
based manner that has up to now been the dominant mode. This interest in the de- 
tailed characteristics of performance converges with a dramatically increased interest 
in the history of recorded performance. The confluence of these two lines of research 
raises the fascinating prospect of a much more serious and detailed examination of 
performance style and its historical changes, as well as a way of bridging the divide 
between the ways in which notated music and unnotated music are studied. If music 
is studied as performance, then whether it exists in a notated form or not becomes 
an issue that is separable from how (or whether) the music is studied at all. 

There have been problems, however. Because of the sheer volume of data that 
comes from even a relatively short performance, there has been the danger of a lack 
of perspective. It is easy to become buried in the expressive minutiae of a four-bar 
phrase and to forget that performance is rather more than tempo curves and dy- 
namic accents. A related problem is that the aims and types of explanation and dis- 
cussion that are of interest to psychologists working in this field may be of less in- 
terest to musicologists, and vice versa. Psychologists are, broadly speaking, more 
interested in general mechanisms (of motor control, perception, cognition) than in 
particular instances, while musicology tends to be primarily concerned with inves- 
tigating phenomena that are more narrowly focused and more specifically located in 
terms of history, culture, or genre. This has at times meant that the two disciplines, 
which should have much in common, pass each other like ships in the night. 
Equally, the repertories that have been investigated have been limited (most of the 
psychology of music has been concerned with the tonal, metrical concert music 
characteristic of the period from about 1750 to 1850), and the data have often been 
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regarded in historically and culturally impoverished terms — though there has re- 
cently been something of a shift here, with Repp's work on large collections of com- 
mercially recorded performances spanning most of the twentieth century (e.g. Repp 
1990a, 1992, 1998). 

Finally, many of the empirical methods described in this chapter have led to a 
thoroughgoing reification of performance — to a view of performance that treats it as 
a thing rather than a process. This is an almost inevitable consequence of the way in 
which temporal phenomena are transformed into static representations of one sort 
or another (graphs, data lists, written descriptions). Some of the work described in 
this chapter retains the more dynamic character of performance as a process, but this 
has been the exception, and the more general tendency is to convert performance, 
in one way or another, into something that is disturbingly like a score — an irony 
given that one of the motivations for studying performance in the first place (as ob- 
served at the start of this chapter) was to get away from the tyranny of the score. 

Various prospects for empirical work on performance can be envisaged. More 
automated methods for deriving data from commercial recordings seem inevitable 
(arriving on the back of digital signal processing technologies in studio recording 
and editing, and speech processing research), and these should enable much more 
extensive analyses of large bodies of recorded performance. This in turn will allow 
more diverse repertories to be tackled, and different questions to be explored. It will 
be interesting to see whether the emphasis then shifts from the concern with general 
mechanisms characteristic of existing published research toward more focused ques- 
tions relating, for instance, to specific styles, performers, or pieces. Equally, the di- 
versification of methods (the analysis of MIDI data and recorded performances now 
complemented by video and discourse analyses) promises a more integrated and 
complementary relationship between quantitative and qualitative approaches, and 
an overdue recognition of the kind of work that has gone on for many years within 
ethnomusicology (see Stock, chapter 2, this volume). This, and a greater awareness 
of social and developmental issues (see Davidson, chapter 4, this volume), signals 
an opportunity to move beyond the cognitive orientation of the last 20 years into a 
far more multidimensional approach to the study of musical performance. 



Notes 



There are additional features such as pitch bend and other so-called controller 
values, but these have never been studied in the literature discussed in this chapter. 
Information about POCO, and instructions for using it via the Web, can be found at 
http://www.nici.kun.nl/mmm. 

The issue of how a MIDI value based on the speed of a hammer, or a key depression, 
relates to the actual loudness of the corresponding note complicates things a little. 
But most synthesizers and Disklaviers implement an essentially linear relationship 
between MIDI values and decibels (a measure of loudness that takes into account the 
nonlinearity of the human auditory system in its response to the physical intensity of 
a sound source; see chapter 8, this volume). 

Strictly speaking the transformation expresses each data point as its reciprocal value: 
value A in milliseconds is transformed into 1,000/A X 60, which converts it into 
tempo expressed in beats per minute (BPM). 
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5. The only exceptions are a one-bar break in the left hand at the midpoint of the 
piece, a similar one-bar break just before the three-chord cadence that concludes the 
piece, and the cadence itself (which consists of two minims and a semibreve). 

6. For a discussion of auditory streaming see chapter 8, this volume. 

7. The option of using independent judges is usually unrealistic, since in order to pro- 
duce a good set of "tap along" data, a person has to get to know the music very well 
so as to be able to anticipate and track tempo fluctuations. 

8. The distinction between quantitative and qualitative approaches can be simply 
expressed as the difference between phenomena that you can measure or give a 
value to (duration, dynamic, the number of occurrences of a word, the size of a 
movement, etc.), and those for which this is impossible (the semantic content of 
speech, the shape of a movement). Further discussion can be found in chapter 9, 
this volume. 

9. There is no attempt within Todd's model to simulate, or model artificially the pro- 
cess by which structure is assigned to a score. 
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CHAPTER 6 



Computational and Comparative Musicology 



Nicholas Cook 



Introduction 

The middle of the twentieth century saw a strong reaction against the comparative 
methods that played so large a part in the disciplines of the humanities and social 
sciences in the first half of the century and musicology was no exception. The term 
"comparative musicology" was supplanted by "ethnomusicology," reflecting a new 
belief that cultural practices could only be understood in relation to the particular 
societies that gave rise to them: it was simply misleading to compare practices across 
different societies, the ethnomusicologists believed, and so the comparative musi- 
cologist was replaced by specialists in particular musical cultures. A similar reaction 
took place in theory and analysis: earlier style-analytical approaches (largely mod- 
eled on turn-of-the-century art history) gave way to a new emphasis on the particu- 
lar structural patterns of individual musical works. Perversely this meant that the 
possibility of computational approaches to the study of music arose just as the idea 
of comparing large bodies of musical data — the kind of work to which computers 
are ideally suited — became intellectually unfashionable. As a result, computational 
methods have up to now played a more or less marginal role in the development of 
the discipline. 

In this chapter I suggest that recent developments in computational musicol- 
ogy present a significant opportunity for disciplinary renewal: in the terms intro- 
duced in chapter 1 , there is potential for musicology to be pursued as a more data- 
rich discipline than has generally been the case up to now, and this in turn entails 
a re-evaluation of the comparative method. Central to any computational approach, 
however, are the means by which data are represented for analysis, and so I begin 
with some examples of "objective" data representations before introducing the issue 
of comparison. (The examples I discuss are graphic, but the same points could have 
been made in terms of numerical representations.) This is followed by an extended 
case study of an important current software package for musicological research, the 
Humdrum Toolkit, and the chapter concludes with a brief consideration of the 
prospects for computational methods in musicology. 
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Issues of Representation 

A picture, as everyone knows, is worth a thousand words. Each of the graphs in Fig- 
ure 6.1 (from Clendenning 1995: 248-249) represents a different aspect of the same 
section from Ligeti's Lux aeterna: (a) charts pitch against time, giving an immediate 
impression of the music's registral profile, (b) shows how many different pitches (not 
pitch classes) are present at any given point, and (c) highlights voice entries, with 
each separate voice (four each of soprano, alto, tenor, bass) having a separate hori- 
zontal line which is shaded whenever the voice enters, while (d) shows essentially 
the same information as (c), only in terms of the total number of entries at any given 
point. These graphs are "objective" in the sense that, once you have agreed how a 
graph is to be laid out, everyone should end up with the same result; the information 
is all there in the score, so that the analysis becomes, in effect, a matter of reformat- 
ting. But the decisions about how to lay them out are not necessarily self-evident. 
For example, in (a) to (c) the minimum values on the time axis are quavers, whereas 
in (d) they are whole bars; other values would have been possible, and the decision 
to use these particular values represents a judgment by the analyst that they will be 
the most informative in this context. There is also a degree of analytical judgment in 
the separate identification of the canons in (a) and (b). 

That said, however, it is obvious that there is scope for automation in graphs 
like this — and not just in drawing them up (the graphs in Figure 6.1 were gener- 
ated by a computer drawing package) but in the processing of the data. The graphs 
in Figure 6.2 (from Brinkman and Mesiti 1991: 5, 6, 13, 17) were automatically gen- 
erated from a machine representation of the score. 1 Again, each graph represents a 
different way of extracting and viewing the information in a score, this time bars 1 
to 26 of the first movement from Bartok's String Quartet no. 4: (a) corresponds to 
Figure 6.1 (a), (b) shows each instrumental part separately with the vertical lines 
providing an impression of the linear movement from one pitch to another, and 
(c) charts the first appearance of each pitch, while (d) represents the dynamic level 
of each instrument (based on whether it is playing and on notated dynamic value) 
and of the whole (in effect a summation of the dynamic graphs of each instrument). 

But what are we to make of such representations? What do they enable us to see 
that we can't hear, and in any case, what is the point of seeing music at all? It is a 
commonplace to describe Ligeti's texture-oriented works as "visual," and Brinkman 
and Mesiti (who in their article offer a case study of Webern's Symphonie Op. 21) 
emphasize the spatial symmetry of Webern's serial structures, commenting that "the 
graphs allow us to view all parts together in their spatial environment. [This] is es- 
pecially helpful since the pitch symmetry that is essential to the musical structure of 
this work is somewhat obscured by the notated score" (Brinkman and Mesiti 1991: 
21). The question then is the extent to which this problem of obscuring applies to 
other repertories, in other words how valuable this kind of visualization is for music 
in general. Yet this question is in some ways misguided: there is no such thing as a 
good or bad representation of music per se, there are good and bad — or at least bet- 
ter and worse — representations for particular purposes. To say that visual represen- 
tations are more appropriate for twentieth-century than for other repertories, or for 
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answering certain kinds of questions about the music than others, is simply to de- 
fine their usefulness, not to criticize them. 

Nevertheless Brinkman and Mesiti produced some interesting results for music 
that one wouldn't normally think of as particularly "visual," including some strikingly 
different representations of music by Bach and Mozart as well as by Bartok, Schoen- 
berg, Wolpe, and Berio. At the very least, such comparisons serve to underline the 
variety of textures found in different composers or styles — and texture is at the same 
time one of the most important aspects of music in terms of how we experience it, 
and one of the hardest to say anything useful about. It is not hard to imagine how 
Brinkman and Mesiti's software could be used within a variety of pedagogical con- 
texts, enabling students to literally and instantaneously "see" different pieces of music 
in a variety of different ways — and the ability to switch easily and quickly from one 
representation of music to another is a crucial element of practical musicianship. That 
this has not happened, and that the potential of the approach has not been fully re- 
alized, reflects the cost of translating academic research projects into software prod- 
ucts for the real world, and illustrates an all-too-familiar pattern in musicological soft- 
ware: a sustained burst of initial enthusiasm is followed by running out of money, 
resulting in software that is sometimes less than fully functional, often less than fully 
documented, rarely properly supported, and usually soon obsolete. 

But my principal answer to the question "what are we to make of such repre- 
sentations?" is a different one. Brinkman and Mesiti (1991: 7) emphasize the partic- 
ular usefulness of their graphs "as a device for comparing different pieces"; Matt 
Hughes, whose quantitative analysis of Schubert's Op. 94 no. 1 was discussed in 
chapter 1 , similarly claimed of his method that it was "a tool for organizing data so 
that one may more discernibly view tendencies and interassociations," adding that 
it was "best utilized when viewing groups of compositions or sections of a composi- 
tion rather than a single work" (Hughes 1977: 145, 150). And there is a general 
point to be made here: the more objective an analytical approach is, the more its mu- 
sicological value is likely to be realized through making comparisons between dif- 
ferent pieces — and sometimes large numbers of them. For example, when you carry 
out a Schenkerian analysis, you are in a sense making a comparison between the 
piece in question and the norms of common-practice voice leading, as systematized 
in the Schenkerian background and middleground. But the value of the analysis 
consists primarily in the lengthy process of making it, deciding which notes go with 
which, which are more important than others, and so forth; the process is lengthy 
because it involves a vast number of interpretive judgments, requiring you to weigh 
up different factors in relation to one another. At the end of it, you have a knowledge 
of the music — you might call it an intimacy — that you did not have at the outset, 
and there is a sense in which the final graph is significant mainly as a record of this 
learning process. With any kind of computational approach, by contrast, all of this 
happens automatically, and in some cases almost instantaneously; the only output is 
the graphic or numerical representation of the music that results. And such repre- 
sentations rarely tell you anything very useful by themselves: they may well be vul- 
nerable to the cheap but telling jibe that they either show you what you hear any- 
how (in which case they are redundant) , or else they don't (in which case there are 
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Figure 6.2. Graphic analyses of Bartok, String Quartet No. 4, I, bars 1-26 (from 
Brinkman and Mesiti 1991: 5, 6, 13, 17). 



irrelevant). It is when you compare representations of different pieces that useful in- 
formation may emerge. 

The value of objective representations of music, in short, lies principally in the 
possibility of comparing them and so identifying significant features, and of using 
computational techniques to carry out such comparisons speedily and accurately. At 
this point, however, the issue of reduction (already broached in chapter 1 , this vol- 
ume) has to be confronted. This can conveniently be explored in relation to the 
analysis of folksongs, since a large of proportion of early work in computational 
analysis was based on such material: there were substantial existing collections of 
folksongs to work on, and the music was monophonic and scalar, hence could be 
coded very compactly — a vital consideration in the days when computer memory 
was a scarce and expensive commodity. But of course, if you code folksongs as a 
series of pitches, or scale degrees, then you are leaving out all the variables of into- 



110 EMPIRICAL MUSICOLOGY 



BOEHME 

CUT[ DEUTSCHLAND DEUTSCHLAND UEBER ALLES] 

REG[ Europa, Mitteleuropa, Deutschland] 

KEY[ B0001 16 F 4/4] 

MEL[ 1 . 2__ 3 2 4 3 2_~7_1_ 

6 5_^ 4 3 2 3_1_ 5 

1 .2_ 3 2 4 3 2_-7_l 

6 5 4 3 2 3__1_ 5 

2 3 2_-7_-5 4 3 2_-7_-5__ 

5 4 3 .3_4# 4#_5_ 5 

+ 1 . 7_ 7_6__5 6 .5_ 5_4_3 

2 .34 5_6_4_2_1 3_2_ 1 

+ 1 . 7_ 1J>J> 6 .5__ 5_4_3 

2 .34 5_6_4_2_1 3_2_ 1 //] » 

FCT[ Volks - Hymne, national, politisch, Vaterlands - Lied] 

Figure 6.3 EsAC code for "Deutschland uber alles" (Helmut Schaffrath, Essen Musical 
Data Package) 



nation, ornamentation, timbre, and other aspects of singing style, not to mention the 
original contexts of use and cultural connotations of the songs. And that obviously 
limits what you can conclude from comparing them. 

At this point a concrete example may be helpful, and Figure 6.3 is taken from 
the Essen Musical Data Package, a series of databases of popular and traditional 
vocal repertories together with tools for encoding and analysis. 2 Coordinated until 
his death in 1994 by Helmut Schaffrath, the Essen databases consisted by the fol- 
lowing year of some 10,000 songs, mainly European (and largely German), but with 
some representation of other traditions, particularly Chinese; they are still being 
added to, though development has been transferred from Essen to Warsaw (Self- 
ridge-Field 1995). Figure 6.3 shows "Deutschland uber alles" in the custom code 
used by the package, EsAC (the Essen Associative Code), and several of the data 
fields are self-explanatory: BOEHME is the name of the database in which this song 
is located, CUT contains the title, REG is the region from which it comes, and FCT 
is the functional category of the song, while KEY contains not only its key (F) but 
also the numbering of the song in the database (B0001), the value of the smallest 
note value (16, a sixteenth or semi-quaver), and the time signature (4/4). The tune 
itself is contained in the field MEL, using a simple code based on the scale degree 
(with # for sharp and b for flat degrees), + and — to indicate shifts to a higher or 
lower register, respectively, and a rhythmic notation based on the shortest value 
(semi-quaver), with a dot prolonging the value by 50 percent, a single underline 
character ( _) by 100 percent, and a double underline character ( ) by 200 per- 
cent. That should suffice for the tune to be read; the only other information con- 
tained in the MEL field is // (for the end of the melody), the use of spaces to indi- 
cate bar lines (the melody has been notated across the bar lines), and the splitting 
up of the melody into separate lines, each of which represents a phrase (this repre- 
sents a judgment on the part of the transcriber, and as such is a rather unusual fea- 
ture of the EsAC code). 

What can be done using this information? The distributed version of the Essen 



Computational and Comparative Musicology 111 

database comes with a simple analytical utility (ANA), which takes a complete data- 
base as its input, and generates as its output an annotated version of the file in which 
up to 12 different kinds of analytical information are added for each song. These in- 
clude straightforward statistical information such as the distribution of ascending 
and descending intervals (how many ascending minor seconds?), the distribution of 
durations (how many crotchets, quavers, semi-quavers?), and distribution of scale 
degrees (how often does the fifth scale degree appear in the lower register?); there 
are also some derived measures such as the percentage of ascending steps and leaps. 
They also include some less obvious information, including the pattern of phrase 
repetitions in pitch and duration, a coding of the contour of each phrase, the 
melodic spine (based on the notes at metrically stressed beats), and the final notes 
of each phrase (this is where the coding of phrases comes in). 3 A limitation of the 
ANA software is that it provides data for each song separately, whereas the statistical 
information would be more useful if it were provided across the database as a whole, 
but there is a reason for this: the package was intended for use with a commercial 
relational database package, which would facilitate this kind of analysis. Needless to 
say, the original package is no longer obtainable. 4 

But what can be done using this information? Schaffrath himself published a 
number of studies based on his datasets, in which he used this kind of analytical in- 
formation to support broad stylistic generalizations, for example as between differ- 
ent traditions: "German folk songs," he concluded from one such study (1992: 107), 
"generally skip more often up than down; in the Chinese pool the opposite ap- 
plies. . . . German songs use the interval of a fourth 51% more often than Chinese 
songs do. This might be explained by the preference for pentatonic scales." But there 
is also a different way in which such information can be used, which is for purposes 
of searching, identification, and classification. A database of 10,000 folksongs is not 
unlike a bibliographic database, and to locate individual items or groups of related 
items you search for particular features, combining them so as to narrow the search 
down to a manageable number of hits. In some contexts all you need to search a bib- 
liographic database is the author's name; in other cases you need to combine an au- 
thor name, two title keywords, and year to locate the article you need. In the same 
way, you might try to locate a particular song by searching for its first few notes or 
incipit; this is the way that traditional dictionaries of themes operate (for instance, 
Barlow and Morgenstern 1983), and it works well with "art" music, where there are 
fixed scores. But it is not such a good way to search for folksongs, which typically 
exist in a large number of variants, often involving variation or the interpolation of 
notes. That is why ANA includes a routine to derive the final notes of each phrase: as 
Schaffrath (1997: 349) explains, "although variants may contain different numbers 
of notes or melodic contours, they tend to retain underlying harmonic structures." 
Or you could combine such a search with a particular melodic spine and/or pattern 
of phrase repetition. And whereas I have described the use of such features in terms 
of simply locating items, the same processes can be used to group together songs or 
repertories in ways that might reflect, for example, historical development, linguis- 
tic associations, or conditions of use. In such ways the automated analysis of musi- 
cal style can become a means of generating or verifying musicological hypotheses. 

Or at least that is the aspiration. In practice, EsAC and ANA suffer from obvious 
limitations, ranging from the pragmatic (the output of ANA is only marginally more 
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user-friendly than machine code) to those of principle. EsAC can cope with only a 
narrow range of music (monophonic, restricted to three octaves, and so on), and is 
well adapted for only a narrow range of applications. ANA extracts a limited number 
of features, each of which represents a radical reduction of the music and is intended 
for a particular purpose (such as statistical generalization or pattern matching); 
moreover, the usefulness of these reductions depends on a number of quite specific 
assumptions, such as that the final-note patterns of phrases vary less than incipits. 
All this makes sense in terms of what the Essen package is designed for; it is, to adopt 
outdated software jargon, a "turnkey" solution to folksong analysis, or at least to a 
certain sort of folksong analysis. And it would be possible to imagine trying to de- 
velop the code, and the analytical software, in such a way as to be more flexible — 
to cope with music of any desired degree of complexity and variety, and to extract 
from it every feature that could be of any possible interest under any possible cir- 
cumstances. That would be like the approach of business software developers in 
the 1980s, who aimed at prepackaged solutions in which every eventuality had 
been foreseen and the appropriate feature bolted on somewhere, if only you could 
find it. 

In musicology as perhaps also in the office, this represents a basic misappre- 
hension — in terms not only of the unlimited variety of musical materials, but also 
of the variety of purposes for which people will want to represent them; as Huron 
(1992: 11) puts it, "The types of goals to which music representations may serve are 
legion and unpredictable." Complexity is not the solution. There are, after all, much 
more complicated codes than EsAC which fulfill certain purposes extremely well 
while being very poorly adapted for others. MIDI code has transformed entire sec- 
tors of the music industry, but because it tries to understand all music on the model 
of keyboard music its handling of microtonal inflection is inelegant, to say the least; 
it also doesn't distinguish enharmonic equivalents, and has to be drastically recon- 
figured if useful analytical information is to be extracted from it — even something 
as simple as locating a C major triad. 5 Again, DARMS 6 is an extremely powerful code 
designed for purposes of printing and publication (it is the basis of all A-R Edition 
scores), and accordingly it understands music as a kind of graphics; it doesn't think 
of an e 1 but of a notehead on the bottom line of a stave that has an F clef on it. As a 
result, it too needs drastic reconfiguration if analytically useful information is to be 
extracted from it (Brinkman 1986). And though the problems of multiple codes are 
alleviated by the existence of interchange utilities (e.g., to turn DARMS into MIDI), 
there are limitations to what is possible: since MIDI does not distinguish between Ctt 
and D\>, the information is simply not there to translate into DARMS, which does 
make the distinction. 

As with business software, the solution to this problem of multiple requirements 
is not the creation of ever more complex and unwieldy integrated solutions, but a 
modular approach involving an unlimited number of individual software tools de- 
signed to serve individual purposes; that way, when you want to do something new, 
you simply design a new tool to do it. Modular approaches like this, however, need 
some kind of framework within which to work — in essence, a set of rules for the 
representation of data in machine-readable files, and for the transfer of information 
between different software tools. Humdrum is an example of just such a framework. 
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A Case Study: The Humdrum Toolkit 

Developed by David Huron in the early 1990s, Humdrum 7 consists of two distinct 
elements: in the first place a syntax — the set of rules I referred to — and in the sec- 
ond, a "toolkit" consisting at present of something over 70 separate software routines 
for manipulating the data in different ways for different purposes. It runs in a UNIX 
environment, 8 and its modular approach is much more readily understood by those 
who have experience of UNIX than those who do not. Like Humdrum, UNIX can be 
thought of as a set of rules plus a set of tools: individual UNIX commands (which 
have names like grep, sort, and cat) do simple things like searching files for a partic- 
ular string and extracting, deleting, or altering it, or rearranging data within or be- 
tween files. The power and flexibility of the system comes not from these individual 
tools but from the possibility of "pipelining" them so that the output of one tool be- 
comes the input of the next, forming chains of commands of unlimited length. The 
Humdrum toolkit is simply a set of additional UNIX tools designed specifically for 
manipulating music-related data. It is this open and modular philosophy, which al- 
ways allows for an additional tool to be added when required, that explains why — 
as Huron (1997: 375) puts it — "Humdrum is especially helpful in music research 
environments where new and unforeseen goals arise." Unfortunately, as we shall see, 
it also means that the user has to have a detailed understanding of the way the data 
are encoded and manipulated. Humdrum, in short, has a steep learning curve. 

All this may sound very abstract, so Figure 6.4 shows the first two phrases of 
"Deutschland tiber alles" in kern code, the normal Humdrum format for represent- 
ing notated music (only two phrases because when printed out — though not in 
terms of internal storage — kern is much less compact than EsAC). 9 This is a direct 
translation into kern of Figure 6.3, so the same information may be found in the data 
fields preceded by !!! (the exclamation marks identify them as comments); only the 
information concerning Schaffrath, the copyright statement, and the !!!ARI_ line 
(which gives latitudinal and longitudinal coordinates) are new. As for the other lines, 
**kern means that what follows is in kern (the double asterisk means that this label 
defines the data that follows), and the lines preceded by a single asterisk define the 
instrument category and instrument (in each case "vox" or voice), meter (4/4), key 
signature (Bl>, the "flat" being designated by " — " — the sign for sharp is " + "), and 
key (F). 10 The tune itself begins on the following line, preceded by a curly bracket 
(this indicates the phrases, while the lines beginning with "=" are bar lines). "4.f" 
means that the first note is the F above middle C (if it were an octave lower it would 
be F, if an octave higher ff, if an octave higher than that fff, and so on); the "4" means 
that it is a quarter note (crotchet), and the " . ", unsurprisingly, that it is dotted. It 
should now be possible to read the melody quite straightforwardly. (The apparently 
simplistic reciprocal notation for durations is unexpectedly flexible, coping with 
most things short of Ferneyhough; triplet quavers, for instance, work out as 6, quin- 
tuplet semi-quavers as 20). That explains everything in Figure 6.4, except the " *— ", 
which marks the end of the record. It should also explain everything in Figure 6.5, 
which shows the same phrases (with the comments omitted), as set in G major by 
Haydn in the second movement of his String Quartet Op. 76 no. 3: instead of the 
single column (or spine) of Figure 6.4, there are now four, one for each line of the 
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!OTL: DEUTSCHLAND DEUTSCHLAND UEBER ALLES 
!ARE: Europa, Mitteleuropa, Deutschland 
!ARD: Europa%Mitteleuropa%Deutschland@ 



!ARL: 
!SCT: 
!YEM: 
** kern 
*ICvox 

* Ivox 
*M4/4 

* k[ b-] 
*F: 

{ 4.f 

8g 

=1 

4a 

4g 

4b- 

4a 

=2 

8g 

8e 

4f} 

{ 4dd 

4cc 

=3 

4b- 

4a 

4g 

8a 

8f 

= 4 

2cc} 



51.5/10.50 
B0001 
Copyright 1995, 



estate of Helmut Schaffrath. 



Figure 6.4 Kern code for first two phrases of "Deutschland uber alles. 



music, ft is helpful to think of this as being laid out like staff notatfon, only rotated 
by 90 degrees. 

Kern is not as easy to read as EsAC (which, apparently, was designed to be sight 
read). And it would seem to be better adapted for linear textures than, say keyboard 
music — though in fact there is no limitation in this respect, partly because you can 
put more than one note on a single line within any one spine, and also because you 
can collapse spines into one another or create new ones at any point, ft is also more 
flexible than it looks. In the first place, you don't have to specify everything in Fig- 
ures 6.4 or 6.5; there is no need to encode phrases, or bar lines, or durations, or 
pitches (in fact the only mandatory lines are "**kern" and "* — "). In the second 
place, there is a large number of further codes and interpretations built into kern, al- 
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** kern 


** kern 


** kern 


** kern 


* k[ f #] 


*k[ f#] 


*k[ f#] 


*k[ f#] 


*G: 


*G: 


*G: 


*G: 


*clefF 


*clefF 


*clefC3 


*clefF4 


*M4/4 


*M4/4 


*M4/4 


*M4/4 


*tb24 


*tb24 


*tb24 


*tb24 


= 0- 


=0- 


=0- 


=0- 


2r 


2r 


2r 


2r 


4.g 


4.B 


4G 


4G 






4r 


4r 


8a 


8d 


. 


. 


=1 


= 1 


=1 


=1 


4b 


4g 


2r 


2r 


4a 


4f# 


. 


* 


4cc 


4a 


4d 


4F# 


4b 


4g 


4d 


4G 


=2 


=2 


=2 


=2 


8a 


4c 


8F# 


4D 


8f# 




8A 




4g 


4B 


4G 


4GG 


4ee 


4cc 


2r 


2r 


4dd 


4b 


. 




=3 


=3 


=3 


=3 


4cc 


4f# 


4A 


4D 


4b 


4g 


4B 


4G 


4a 


2e 


4e 


4C 


8b 




4G 


4C# 


8g 




. 




=4 


=4 


=4 


=4 


2dd 


2d 


2F# 


2D 



Figure 6.5 Kern code for Haydn, String Quartet Op. 76, no. 3, II, 
bars 1-4 



lowing you for instance to include symbols for articulation or ornamentation, 
clefs, stem direction, or the number of staff lines, if any (and whether they should 
be colored or dotted). But more than that, "kern spines can be mixed with others, 
including, for example, such predefined representations as **text (for lyrics) or 
**IPA (lyrics, but now notated using the International Phonetic Alphabet), or new 
representations defined by the user: you might, for example, create spines labeled 
* *bowing, to show how a given pattern is (or might be) bowed, or * *timing, to en- 
code the rhythmic nuances in a particular performance, or * * heartbeat, to record lis- 
teners' physiological responses to hearing the music. Then again, you might not use 
**kern at all, but an alternative representation for pitch; predefined representations 
include **pitch (using the International Standards Organization format), **deg 
(scale degrees, as in EsAC), "solfg (French fixed-do solfege), **freq (frequency 
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expressed in cycles per second), or the self-explanatory **midi; any of these repre- 
sentations can be automatically translated to any other, provided of course that the 
relevant information is there to be translated (recall the problem with MIDI and en- 
harmonic equivalents). There is also a particularly well-developed **fret represen- 
tation, which turns pitch notation into tablature notation based on any specified 
number of strings and tuning, or of course you might always invent a new repre- 
sentation for some particular repertory or purpose. 

And what does all this enable you to do? For a start, you can replicate the func- 
tions of ANA: to extract the melodic spine of Figure 6.4, for instance, you locate the 
downbeat of each bar (search for lines beginning "=" and then skip to the next line), 
skip over the duration figures, and extract the pitch symbol(s) to a new file. Or you 
could just as easily search an entire "BOEHME" database for melodic motions from 
M to A 5 and establish how frequently they occur by comparison with the reverse 
motion (you would translate the database into **deg records in order to do this, and 
search for adjacent lines containing 4 and 5, skipping over irrelevant symbols like 
durations, and finally counting the number found). Or you could search all of Bach's 
chorales (the Riemenschneider collection is available in kern code) in order to es- 
tablish how often the melodic leading-notes are approached from beneath, as against 
from above: to do this, you extract the spines containing the melodies, turn them 
into **deg records, and count the occurrences of " A 7" as against "v7." n (This par- 
ticular task is easier than it sounds, because * *deg automatically codes the direction 
in which a given scale degree was approached, though not the size of the interval in- 
volved). Then again, shifting to a different level of complexity, you could analyze the 
relationship between a particular succession of vowels in song texts and the melodic 
or rhythmic contexts within which they occur. You could establish how far particu- 
lar harmonic formations are correlated with specific metrical locations (and analyze 
the results by composer, period, or geographical origin). Or you could classify dif- 
ferent melodic, harmonic, or structural contexts in Chopin's complete mazurkas, 
and use this as the basis for analyzing timing information extracted from a large 
number of recordings. 

But the best way to see what Humdrum is capable of is to examine published 
studies that have used it, of which the majority (though by no means all) are by 
Huron and his coworkers; I shall briefly describe five. An article on the melodic arch 
in folksongs (Huron 1996) is based on the Essen collection and provides a direct 
comparison with Schaffrath's own work. Its purpose is to test systematically the fre- 
quent, but inadequately supported, claim that "arch" contours are prevalent through- 
out Western folksongs, and Huron points out the dangers of "unintentional bias" 
(1996: 3) when such generalizations are supported by a small number of possibly 
handpicked examples: they can be firmly established only through the analysis of 
data sets large enough to allow the drawing of statistically significant conclusions. 
He puts forward two basic ways in which folksongs might be said to exhibit arch- 
shaped contours — within each phrase, and overall — and systematically analyzes 
the Essen collection for each. For the first test, individual phrases throughout the 
collection were sorted according to the number of notes in them, and the average 
contour for each phrase-length computed; for the second, the average pitch of each 
phrase of each song was computed, and the resultant contour classified. The overall 
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conclusion was that there is a strong tendency toward arch-shaped contours on both 
measures (though a third approach, based on the nesting of arches within one an- 
other, was not supported). 

A project like this might be described as regulative: it takes an existing musico- 
logical claim and tests it rigorously against a large body of relevant data, adopting cri- 
teria of significance and certainty that may be unfamiliar in musicology but are nor- 
mal in data-rich disciplines. Two further studies illustrate the range of musicological 
claims that can be tested in this manner. A study carried out in collaboration with 
Paul von Hippel (Hippel and Huron 2000) focused on the "gap-fill" model of melody, 
according to which listeners expect melodic leaps to be followed by a change of di- 
rection; this is an important element of Narmour's "implication-realization" theory, 
which seeks to explain how listeners experience music on the basis of whether or not 
implications set up by the music are realized in what follows. What is at issue is not 
whether or not melodic leaps are usually followed by a change of direction, which is 
undoubtedly the case; it is whether or not this happens (as Krumhansl has concluded 
on the basis of experimental studies) 12 because of listener expectations. The alterna- 
tive is that it is a trivial consequence of the limited registral range of vocal music, a 
possibility which von Hippel and Huron (2000: 63) explain by comparison with 

a simplified melody that is confined to a range of three adjacent pitches — 
for example, A, B, and C. In such a melody, the only skip available is between 
A and C. Upward, this skip must land at the top of the range; downward, it 
must land at the bottom. After a skip, therefore, there is no way for a melody 
to continue moving in the same direction. On the contrary, two of the three 
available pitches can be reached only by a reversal. Although most melodies 
have a wider range, they are subject to the same basic argument. 

If this second possibility is the case, then gap-fill patterns will be no more common 
in real folksongs than in randomly generated melodies with otherwise comparable 
characteristics — and, to cut a long story short, this is exactly what von Hippel and 
Huron found. If their argument (which I have grossly simplified) is accepted, then 
it provides a striking instance of how studies based on large data sets can raise ques- 
tions about conclusions derived even from experiments as meticulously controlled 
as Krumhansl's. 13 

A third study (Huron 2001) makes a similar kind of point in relation to "art" 
music. It reassesses Allen Forte's (1983) set-theoretical analysis of the first move- 
ment of Brahms's String Quartet Op. 51, no. 1 — an application to high Romantic 
music of an approach originally developed for twentieth-century music, bringing 
with it an appearance of objectivity and rigor in stark contrast to conventional mo- 
tivic descriptions of such music. (In other words, Fortes article is a late expression 
of the postwar culture of objectivity described in chapter 1, this volume.) Whereas 
a "motive" is simply a characteristic melodic figure that recurs prominently through- 
out a piece, Forte's "alpha" ic (interval class) pattern can be defined much more pre- 
cisely: it is a particular intervallic configuration that may occur in any of the stan- 
dard transforms of set theory (prime, inversion, retrograde, retrograde inversion), in 
any rhythmic configuration, and at any metrical point. So Huron proceeds to search 
systematically for Forte's "alpha" pattern, not only in the first movement of Op. 51, 
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no.l but also in the first movements of Brahms's other quartets — a comparable 
repertory in which the "alpha" pattern, if it is really characteristic of Op. 51, no. 1, 
should be significantly less common. First, he demonstrates that, as defined by 
Forte, the "alpha" pattern is common in Op. 51, no. 1, but no more so than in the 
other quartets. However, if you restrict the search to instances of the pattern that 
begin on metrically strong beats and come at the beginning of slurs, then it is more 
prevalent in Op. 51, no. 1 than in the others. This is particularly the case if you look 
for the pattern only in its prime form, and even more so if you consider only those 
cases where the pattern coincides with a long-short-long rhythmic pattern. So the 
prevalence of Forte's "alpha" pattern is confirmed — but only under precisely those 
conditions that are captured by the conventional idea of a motive! The conclusion 
must be that the appearance of rigor in Forte's analysis as against traditional analyti- 
cal descriptions is just that, an appearance. 

Important and indeed salutary as this regulative function may be, it would be 
wrong to give the impression that computational analysis is good only for testing the 
validity of existing theoretical or analytical claims. It can also be used to undertake 
work that could hardly be achieved in any other way. A spectacular example is the 
correlation of musical features with geography, a project once again based on the 
Essen database (Aarden and Huron 2001). Recall that the Essen records include in- 
formation concerning the geographical origins of the songs contained in them — 
sometimes down to the level of an individual town or village — and that the kern ver- 
sion of "Deutschland iiber alles" added longitudinal and latitudinal coordinates in 
the MARL field (a new field defined specifically for Aarden and Huron's project). 
Standard Humdrum commands can be used to extract records from the database 
and output their coordinates; these are then used as input to a mapping program. By 
way of example, Figure 6.6 shows the distribution of major- and minor-mode songs 
within the Essen database; these maps were generated using the GEO-Music site 
(http://www.music-cog.ohio-state.edu/cgi-bin/Mapping/map.pl, not accessible at 
publication time), established as part of the project. It is hard to know what conclu- 
sions might be drawn from this comparison (other than that the major mode is con- 
siderably more widespread than the minor); the southward skewing of the major- 
mode data as against the minor simply reflects the three occurrences in Italy — 
hardly a basis for robust generalization — while the concentration in Germany and 
central Europe evident in both cases reflects the bias of the Essen database. A fully 
fledged musicological research project would no doubt involve dealing with more 
features at a greater level of detail. Nevertheless this example does indicate the po- 
tential for computational methods to draw together quite diverse kinds of informa- 
tion; it also illustrates the value of graphic presentation of the complex data gener- 
ated by work in comparative musicology 

A final study (Huron and Berec, forthcoming) is perhaps even more suggestive 
in terms of possible musicological applications. Like Hofstetter's attempt (mentioned 
in chapter 1) to turn a claim about the "spirit of nationalism" into an empirically 
testable proposition, Huron's and Berec's starting point is a woolly, common-language 
concept: idiomaticism. How might you set about defining what is idiomatic on, say, 
the trumpet in such a way that a computer could make evaluations of just how idio- 
matic a particular piece of music is? In the first place, Huron and Berec say, idiomat- 
icism is not the same as difficulty: a piece of trumpet music can be difficult but idio- 
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Figure 6.6 Distribution of major- and minor-mode songs 
(generated using the Geo-Music site) 
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matic, or easy and unidiomatic. It is rather a matter of achieving the desired musical 
effect in the easiest possible manner. For example, a piece will be idiomatic if it is sig- 
nificantly easier to play as written than when transposed up or down by say a semi- 
tone; this is "transposition idiomaticism." So Huron and Berec evaluated the diffi- 
culty of a number of pieces when played in all transpositions from an octave below 
the original to an octave above, and in several cases the original proved to be signif- 
icantly easier to play than most or all of the transposed versions. (They also ran a 
parallel test based on how much more difficult the music was when played at a dif- 
ferent tempo from the notated one, with similar results.) But there was also a further 
finding: the pieces had been selected so as to include examples by both trumpet vir- 
tuosi and nontrumpeters, and the conclusion in respect of both transposition and 
tempo idiomaticism was the same — that expert trumpeters compose more idiomat- 
ically than composers who cannot play the instrument. 

That conclusion might be considered predictable, not to say obvious (just as in 
the case of Hofstetter's discovery that there are differences between national styles) . 
And it would have been of limited interest if Huron and Berec had simply asked 
trumpeters to play the pieces at different transpositions and at different tempos, and 
to say how difficult they were, an approach that would not have required Humdrum, 
of course. But instead — and this is the real point of their study — they attempted 
something much more challenging: what they refer to as "the design and imple- 
mentation of a computer model of a trumpet performer." Their model takes a hern 
file as its input, and evaluates the difficulty of performance along a variety of differ- 
ent dimensions: in terms of the valve transitions between successive notes; in terms 
of register, particularly in the case of sustained notes; in terms of dynamics; and in 
terms of tonguing at different speeds. (The information on which they based the 
model included performer assessments in the case of valve transitions, tests of per- 
formance in the case of tonguing, and physiological data concerning lung capacity 
and diaphragm support in the case of sustained notes.) The model was tested using 
a series of trumpet studies set for different grades by the Royal Conservatory of 
Music of Toronto; there was a generally fairly high correlation between its estimates 
of their difficulty and the Royal Conservatory's. But of course the main test was the 
evaluation of idiomaticism, and the fact that the model come up with what every- 
body knows confirms the extent to which it actually works. That is the real conclu- 
sion to be drawn from the study. 

Do musicologists really need a computer model of a trumpet performer? One 
answer to this is that the approach could be readily generalized to other instruments: 
to the recorder (where finger transitions would be of particular importance, owing 
to the complexity of cross-fingerings), to the piano (the model might propose the 
best fingering, which could be tested against those specified by editors or used by 
performers), or to the guitar (in effect a modeling of the dance of the fingers on the 
fretboard). But the more substantial answer lies behind this. One of the chronic 
problems with analyzing music is the tendency toward abstraction: the more so- 
phisticated our analytical models, the more divorced the object of analysis seems to 
become from the experience of music, and especially from the sense of physical en- 
gagement in which much if not all music has its source. It is conventional to deplore 
this, but to do nothing about it. In this context a computer model of the electric gui- 
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tarist's left hand could become, perhaps paradoxically, a means of thinking the body 
back into musical analysis: you could, to give just one example, chart the interaction 
of the dancing fingers with the ear — of the imperatives of the fretboard with those 
of "purely musical" implication and realization — in rock guitar improvisation. In 
this way computational musicology holds out the promise not only of providing 
more robust answers to questions that have already been asked, but also of making 
it possible to ask new questions. 

Conclusion: Brave New World? 

Promises, promises . . . and it must be admitted that computational software has a 
hardly better record of delivering on its promises than dot com companies. Let's stick 
with Humdrum as the most likely current candidate for a powerful, general-purpose 
computational aid for musicology, and explore some of the practicalities of doing 
musicology with it. 

Q. Is it easy to get hold of? 

A. Yes, at the time of writing you can download it for free, or you can buy it 
for the cost of the materials from the Center for Computer Assisted 
Research in the Humanities at Stanford. 14 

Q. Will it actually run on my computer? Don't I have to be using a UNIX 
machine? 

A. You can run it on a PC or a Mac, but you will need to install a UNIX 
toolkit. You can probably get that for free, too. 1 ' 

Q. But will I be able to find the music I want to work on in kern code? 

A. That depends what you want to work on. The Essen package is available 
in kern, and so is some of the Densmore collection of Native American 
music, so if you are interested in folk music you can start work right 
away. As for Western "art" music, there is quite a lot out there: not only 
Bach's chorales but also most of his cantatas, as well as the Well-Tempered 
Clavier; substantial amounts of Corelli, Vivaldi, Handel, and Telemann; 
chamber and orchestral music by Haydn and Mozart; and most of the 
Beethoven symphonies. 16 The hope, of course, is that as more musicolo- 
gists use Humdrum, so more music will be encoded, encouraging more 
musicologists to use Humdrum — and in this way you get a virtuous 
circle. (The same applies to the development of the tools themselves.) 

Q. And if I can't find what I want? 

A. If it exists in some other code you may be able to translate it into kern. 
Translation routines exist for the MuseData code used by the Center for 
Computer Assisted Research in the Humanities (CCARH), the Plaine and 
Easie Code used for RISM (Repertoire international des sources musicales), 
and MUSTRAN, as well as the code used by Leland Smith's music notation 
program SCORE, and the "Enigma" code used by Finale. Not all of these 
translators are readily available or unproblematic, but if you still can't 
find what you want you can of course encode your own. Rather than 
doing it by typing in the code, you can use an encoding routine that 
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enables you to enter the music on a MIDI keyboard — rather like step- 
time entry in a sequencer package. There are also utilities for playing 
kern code (via MIDI), for displaying music notation on screen, or for out- 
putting it as a PostScript file (though not all of these are available if you 
are running Humdrum on a PC or a Mac). You can turn kern files into 
Finale ones for printing, too. 

Unfortunately, none of this addresses the real difficulties of integrating Hum- 
drum into everyday musicological life. The fact is that not everybody is happy with 
a UNIX command-line environment, or finds it easy to remember the different com- 
mands with their multiple (and often complex) options; people who use UNIX (or 
Humdrum) on a daily basis can operate it much more quickly than a graphic envi- 
ronment (like Windows, with its drop-down menus and mouse control), but if 
Humdrum is to become part of everyday musicological life then what matters is its 
accessibility to the occasional user. Michael Taylor (1996) has developed a graphical 
user interface (GUI) for Humdrum, though it is not publicly distributed at the time 
of writing, 17 and this makes life considerably easier for the occasional user: there are 
drop-down menus listing the various Humdrum commands and dialog boxes where 
you set the associated options, and you can even select musical elements using ordi- 
nary language ("any bar line," "at least one flat," or "C major chord," for instance). 18 
For nonspecialist users such an interface — if generally available and supported — 
would surely represent a substantial step in the right direction: it not only looks fa- 
miliar but also constantly reminds you of the commands and options that are avail- 
able, without significantly compromising Humdrum's functionality and flexibility. 

But such an approach can only go so far. This is because, to use Humdrum at 
all, you need quite a detailed knowledge of its representations (kern and the rest); a 
great deal of Humdrum usage has to do with things like extracting the spine you are 
interested in, merging spines, or stripping out information that is not wanted for any 
particular operation — and even ordinary-language definitions cannot protect you 
from the need to understand what is going on in the code. For this reason Andreas 
Kornstadt (1996) advocates a different approach: instead of "a Humdrum 'command 
center' with lots of buttons and gadgets with names like yank and MIDI | smf ," as 
he puts it (1996: 119), he has developed an analytical environment into which dif- 
ferent GUI modules can be slotted for different applications. Each module can be 
thought of as a "browser" that let users view just those musical elements that they 
are interested in, and no others; for instance, Kornstadt has created such a system 
for leitmotivic analysis, in which users can define leitmotifs and locate their occur- 
rence in a score on the basis of on-screen notation and intuitive commands. This, 
again, is not publicly available, but another of Kornstadt's customized applications 
of Humdrum is: an on-line thematic dictionary, called Themefinder. 19 You type in 
some pitches from the tune you want to find (or alternatively you can type in the 
scale degrees, or the intervals between the notes, or even just its contour), and the 
software finds all the matches in its database and displays them on screen in musi- 
cal notation. There are no Humdrum commands; you don't even need to know it is 
Humdrum that is doing the work — and what makes this possible is that Theme- 
finder is designed to do one thing and one thing only. It is hard at the present time 
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to tell how far Kornstadt's approach — the development of specialized systems for 
different analytical purposes — will prove to be compatible with the flexibility and 
unpredictability of real-world musicological research, or whether it represents a par- 
ticularly sophisticated way of falling into the same traps as 1980s office software. 
There is also the question of how the substantial costs of developing and updating a 
comprehensive system of this kind are to be found. 

Maybe, in the coming years, the combination of user-friendly interfaces for 
Humdrum and a developing body of musicological work using it will provide the 
necessary momentum for it to become more generally used, so that in time it will 
become just one of those skills that musicologists have to acquire (like using nota- 
tion software, for example). Or maybe that is just not going to happen; in that case, 
computational software like Humdrum might still become part of the wider musi- 
cological scene, but as a specialist professional service, rather than a skill you acquire 
for yourself (rather like note processing was a decade or so ago). In other words, if 
you had a project in which computational approaches could play an important 
part — in generating or testing hypotheses on the basis of a large body of data, for in- 
stance — then you would call in the services of a specialist computational musicolo- 
gist, who would arrange for any necessary encoding of data, carry out the analysis, 
and provide advice on the interpretation of the results. What the ordinary musicolo- 
gist would need to know, then, would be not how to extract or collapse Humdrum 
spines, but how to formulate a research question in such a way that advantage can 
be taken of computational methods to answer it more securely than is possible using 
the methods traditional in data-poor fields. Whether anyone could make a decent 
living as a specialist computational musicologist is another matter. 

Using computers for musicology is like using computers for anything else: you 
need to have reasonable expectations about what they can do. However open and 
flexible its design, any musicological software is a tool, and like any tool it is good 
for some things and not for others. The articles by Huron and his coworkers that I 
have described in this chapter might be characterized as finding musicological uses 
to which computational tools can be put; in essence, they illustrate the software's po- 
tential, and they do so in a manner that owes more to social-scientific discourse than 
to traditional humanities writing, with their explicit hypotheses, control groups, 
quantification of significance, and formulation in what Huron (1999b) has termed 
the "boilerplate language" of scientific inquiry. Perhaps that is only to underline the 
distinction between "cognitive" or "systematic" musicology, as such work is sometimes 
called, and the close attention to context and what might be termed epistemological 
pluralism of the traditional discipline. But what I am suggesting is that musicology in 
the broadest sense can take advantage of computational methods and transform itself 
into a data-rich discipline, without giving up on its humanist values. Understood that 
way, scientific discourse is unlikely to offer a general model for the musicology of the 
future: it is up to musicologists to develop such a model. And that does not mean try- 
ing to find musicological uses to which computational tools can be put, but the re- 
verse: discovering the tools that are most appropriate to a particular musicological 
purpose, and taking full advantage of them. At present we hardly have such a thing as 
a model of how sophisticated computational approaches might be integrated within 
a sustained, musicologically driven project. Yet the tools are ready and waiting. 20 
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Notes 



1. The routines used to create these graphs, which took advantage of the NeXT com- 
puter's Postscript display, have not to date been disseminated, but the principles 
underlying them were described in chapter 20 of Brinkman 1990. 

2. Helmut Schaffrath, Essen Musical Data Package (Menlo Park, Calif.: Center for Com- 
puter Assisted Research in the Humanities [CCARH], 1995) [data and analytical 
software on four floppy discs for MS-DOS]); 6,225 folksongs were included in this 
release. Both data and software are now available at www.esac-data.org. (All URLs 
were correct at press time, but Web resources change rapidly. Search engines may 
provide a better means of access to the resources listed in this chapter.) 

3. A full listing of the analytical output for "Deutschland uber alles," reformatted and 
annotated for clarity, may be found in Selfridge-Field 1995; see also Schaffrath 
1992, 1997. 

4. Since EsAC consists entirely of ASCII characters, it is possible to import the records 
into modern databases. However researchers now generally prefer to access the kern 
version of the databases and process them using Humdrum (see note 7 below). 

5. Hence the need for translation programs such as POCO (see chapters 5 and 8, this 
volume). 

6. Digital Alternate Representation of Musical Scores; for details see Selfridge-Field 
1997, chapters 11-15. 

7. David Huron, The Humdrum Toolkit: Software jor Music Researchers (Stanford, Calif.: 
Center for Computer Assisted Research in the Humanities, 1993 [three floppy discs and 
16-page installation guide]). A wide range of information may be accessed from the 
Humdrum Toolkit home page (http://dactyl.som.ohio-state.edu/Humdrum; links to 
on-line resources are at http://dactyl.som.ohio-state.edu/Humdrum/resources.html). 

8. UNIX is an operating system, like MS-DOS or Windows, but includes a large num- 
ber of general-purpose tools and thus fulfils many of the functions of a programming 
environment as well. Developed in the late 1960s, it remains the environment of 
choice for many professional software developers and researchers. 

9. For a summary of kern see Huron 1997; for detailed descriptions see Huron 1995, 
1999a. 

10. Lines beginning with asterisks are known as "interpretation records," those with two 
asterisks being inclusive interpretations (something is either in kern or it isn't), and 
those with one asterisk being tandem interpretations (of which you can have any 
number) . 

11. In a review of the Humdrum Toolkit, Jonathan Wild (1996) includes what is essen- 
tially a tutorial on this precise task; assuming that the chorales have been concate- 
nated into one file, the entire analysis consists of seven commands pipelined to 
another — in other words, it can be executed with a single command line. Further 
analytical applications of Humdrum are discussed in Huron 2002, which appeared 
after this chapter had gone into production. 

12. See this volume, pp. 7-8. 

13. A follow-up study (Hippel 2000) reanalyzed the experimental data on the basis of 
which Burton Rosner and Leonard Meyer had claimed that gap-fill patterns reflect 
listener expectations; von Hippel concludes that that the apparent effects of gap-fill 
expectations were an artifact of experimental design (specifically, effects of training). 
Of course listeners may have such expectations, but von Hippel's and Huron's point 
is that these are the result of melodic patterns and not the other way around. 
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14. See the Humdrum download page at http://dactyl.som.ohio-state.edu/ 
HumdrumDownloaa7downloading.html. Additional tools developed by Craig Sapp 
can be downloaded from http://www.ccarh.org/software/humdrum/museinfo. 

15. UWIN (free to educational/research users) is accessible through the Humdrum 
download page (see n. 14). Commercial alternatives are available from Morton Kern 
Systems (MKS Toolkit, like UWIN for Windows) or from Tenon Intersystems (for 
Mac). There is a further alternative: a number of Humdrum commands can be run 
online at http://musedata.stanford.edu/software/humdrum/online; you select the 
command you want, supply the input data (by pasting it into a window, uploading a 
file, or supplying its URL), set the options, and receive the output as plain text or in 
HTML. Twenty of the 70 or more Humdrum commands are currently available in 
this way, with an emphasis on data translation and conversion. (Not accessible at 
publication time.) 

16. There are two principal sources for such material (from both of which they may be 
obtained free): the KernScores site at http://kern.humdrum.net, and CCARHs 
MuseData site (http://www.musedata.org). 

17. "Humdrum Toolkit GUI" currently exists only in a 16-bit version, but further devel- 
opment (and eventual distribution) is planned by the Sonic Arts Centre at Queens 
University, Belfast. 

18. Only a few of these "regular expressions," as they are known in UNIX jargon, are 
built into the interface, but you can add your own, along with new interpretations; 
this is where you would define ** timing or "heartbeat. 

19. Accessible at http://www.themefinder.org; for a brief description see Kornstadt 
1998. 

20. My grateful thanks to David Huron, who has influenced this chapter in many ways, 
not least through making it possible for me to work with him in 2000 as a visiting 
scholar at the Cognitive Musicology Laboratory, Ohio State University, Columbus. 
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CHAPTER 7 



Modeling Musical Structure 



Anthony Pople 



More than 20 years ago, music analysis was famously described by Ian Bent as "that 
part of the study of music which takes as its starting-point the music itself, rather 
than external factors" (Bent 1980: 341). Indeed, analysis is generally motivated by a 
desire to encounter a piece of music more closely, to submit to it at length, and to be 
deeply engaged by it, in the hope of thereby understanding more fully how it makes 
its effect. It is perhaps not surprising, then, that if you were to take a look at the kinds 
of writing that have at one time or another been thought of as music analysis, the va- 
riety would be immense. To a large extent this is because of the personal element in 
analysis: a piece of analytical writing is almost always the work of one person, and 
is founded in that person's own experience of an individual work. But even in this 
regard music analysis is significantly different from music criticism, or indeed liter- 
ary criticism, because analysts generally try to play down the fact that their analyses 
are dependent on a personal viewpoint. Although music analysis certainly does have 
a critical dimension as one of its characteristic attributes (see Pople 1994), a writer 
of music analysis will in general try to present his or her observations as represent- 
ing a musical experience that can be shared, and will address the reader as a kindred 
spirit eager to inquire about the piece in the same terms as the author has done. 

This balance between the personal point of view and the potential for captur- 
ing shared experience makes music analysis a domain that is not only by its very na- 
ture empirical, but also one on which formal empirical methods can be brought to 
bear. However, the fact that one can say this in relation to analysis as the term is 
understood today is very much a consequence of the prevailing close relationship 
between music analysis and music theory. 

As Nicholas Cook has pointed out, there is a broad historical distinction be- 
tween music theory — which studies musical works in order to deduce "more gen- 
eral principles of musical structure" — and music analysis, in which the interest is 
focused on individual pieces of music (Cook 1987: 7). Since about 1970, without 
contradicting this distinction, a symbiotic relationship has arisen between theory 
and analysis: theory has developed by using analysis as a kind of test-bed, while pro- 
fessional analysts have by and large conscientiously used the language of contem- 
porary theory to express their insights. This scenario is sufficiently close to the sci- 
entific model of hypothesis and experiment to have aroused antagonism from some 
humanities scholars (see, for example, Snarrenberg 1994: 50-55). But it has also al- 
lowed other scholars from disciplines that are more committed to formal methodol- 
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ogies, including cognitive psychologists and computer scientists, to see music analy- 
sis as a potentially fruitful domain of research. Some of what has been achieved is 
remarkable, but from a music analyst's point of view the question is: can it engage 
the reader's musical interest as deeply as can analytical writing based on expert 
human contemplation? 

It is important to bear this in mind if one is to weigh up the value of differing 
approaches in which the empirical quality so fundamental to analysis is treated more 
formally. I have written elsewhere that "when people read something called 'music 
analysis' they assume that they will be 'told something about the piece'," and that in 
this respect there are specifically two types of failure that can occur if the analysis 
fails to engage sufficiently with the personal, empirical, critical and didactic aspects 
of analysis: 

The kind of . . . writing that remains tangled up in its own high-level assump- 
tions, and so deals with musical detail in an anecdotal fashion . . . certainly 
"tells us something"; but, to those who expect the strategic sweep of analysis 
to range constructively through to the personal/empirical, what it tells us 
may not seem to be "about the piece." Conversely, work that is excessively 
bottom-up . . . may remain tangled in the empirical and fail to cross other 
than trivially into the critical/didactic domain; it is certainly "about the 
piece," but may "tell us" little or nothing. (Pople 1994: 121) 

As Cook points out, good analysis does not merely reflect and describe experience, 
but also has the ability to make us hear the music differently (Cook 1987: 228- 
229). Together, these observations constitute a set of linked criteria that should be 
borne in mind as we examine in detail such issues as the place of rigorous technical 
language in music theory, the quasi-scientific approach to the relationship between 
theory and analysis, and the modeling of music analysis from interdisciplinary per- 
spectives. 



Formal Theory, Informal Analysis 

Take, for example, the development of a rigorous terminology for the description of 
serial music at Princeton University in the 1960s, under the influence of the math- 
ematically trained composer and theorist Milton Babbitt. In an article on "Twelve- 
Tone Rhythmic Structure and the Electronic Medium" published in the inaugural 
issue of the journal Perspectives of New Music, Babbitt outlined a theory of serial 
rhythm using language that mathematicians would regard as informal, but which to 
classically trained musicians might seem mathematical: 

[As] a means of informally evaluating the temporal constraints imposed by 
the formation of a twelve-tone set, I shall assume on purely empirical grounds 
that there are eleven qualitatively significant temporal relationships which can 
hold between two musical (say, pitch) events. Let x and y designate these 
events, and let a left parenthesis signify the time point initiation of the event 
and a right parenthesis signify the time point termination . . . 
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1. x) < (y. [that is, the termination of x precedes the initiation of y] 

2. (x < (y; x) < y); but x) <| (y [that is, the initiation of x precedes that of y; 
the termination of x precedes that of y; but the termination of x does not 
precede the initiation of y] 

3. (x < (y; x) <| y); y) <| x). 

4. (x < (y; y) < x). 

[etc.] (Babbitt 1962: 52-53) 

As Babbitt points out, what he writes is derived from empirical analysis of musical 
events, but his observations have been transmuted into abstract terms that can sub- 
sequently be treated as theory. (This is in line with the distinction between theory 
and analysis described by Cook.) But although Babbitt's process can in turn be re- 
versed, by using the abstract terms as a method for analysis, this way of working has 
never caught on among analysts — most likely because, although one can look for 
concrete examples of Babbitt's equations, this isn't going to succeed, other than in a 
trivial way, in helping us to hear the music differently. 

A second example, as interesting for its similarities to the Babbitt case as its dif- 
ferences — and far more productive analytically — is the set-class theory of Allen 
Forte, the principal concepts of which were outlined in his book The Structure of 
Atonal Music (Forte 1973). Forte's theory aimed to provide a system that would en- 
able the analysis of atonal music to achieve two important objectives: (1) to avoid all 
vestiges of tonal terminology (e.g., convolutions such as "minor seventh chord with 
the fifth omitted but with both raised and lowered elevenths"); and (2) to enable ab- 
solutely any configuration of notes to be given a label for purposes of discussion and 
comparison. These two objectives were met simultaneously by presenting a means 
for all pitch configurations to be labeled in an entirely new way making no reference 
to any tonal categories. 

The labeling system depends on three basic concepts: pitch-class, pitch-class 
set, and set-class. (Unfortunately, the distinction between the second and third of 
these concepts was initially obscured by Forte's presentation, though it has been 
clarified in the work of later writers.) As the prevalence of the word "class" indicates, 
the underlying principle is one of classification. To understand what a pitch-class is, 
ask yourself what is the tonic of a piece in C major: the answer, naturally enough, is 
"C," but not any particular C — rather it is a category that stands for all the notes C, 
irrespective of further considerations like which register the C is in, whether it is a 
crotchet or a quaver, whether the tempo is fast or slow, whether it is played loudly 
or softly, or by whichever instrument. And since Forte is concerned with the totally 
chromatic world of atonal music he doesn't distinguish between C and its enhar- 
monic equivalents Bfl and Dl>l>, he simply labels the keys on the keyboard with inte- 
gers starting with for the keys that play B(t, C, or Dl>l> and working upward to 11 
for the keys that play B natural (or Ail or Cl>). A pitch-class set, naturally enough, is 
a set of pitch-classes, such as [0, 3, 4, 6]: these are, if you like, C, El>, E, and F(t (or 
their enharmonic equivalents), but it doesn't matter what order they are heard in, or 
whether they are played at the same time, or anything of that kind — it is simply a 
collection of the four pitch-classes, taken in the abstract. 
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Finally a set-class is a classification of pitch-class sets, so that the vast number 
of possibilities can be reduced to manageable proportions (in the case of Forte's sys- 
tem, about 200 set-classes in all). Pitch-class sets (abbreviated to pc sets) are reck- 
oned to belong to the same class if they are related by transposition of all their con- 
stituent pitch classes by the same interval, and/ or inversion of the intervals between 
the constituent pes. This can be expressed numerically: two of the pc sets that be- 
long in the same class as our set [0, 3, 4, 6] are [1, 4, 5, 7] — transposing each pc up 
a semitone — and [0, 2, 3, 6], which though it's harder to see at first is simply the 
original set turned upside down. Forte gives this set-class the composite label 4-12, 
because it has four pes in it and is 12th in his list of the four-note set-classes, organ- 
ized according to the intervals they contain. 

That, in a nutshell, is the theory, but what of analysis? Taken on its own, Forte's 
labeling system is as lacking in operational specifics as are Babbitt's equations. But the 
prospect of turning the theory around, and looking for examples of pc sets in musical 
scores, turned out to be far more appealing in Forte's case. Inspired by the analytical 
examples that accompanied Forte's exposition of his theory in The Structure of Atonal 
Music, many analysts took up his approach and found in it a new and productive way 
of approaching previously intractable scores by composers such as Schoenberg, We- 
bern, Berg, Ives, Bartok, and Stravinsky. What demands our attention here is the fact 
that, while Forte's labeling system involves the rigorous application of a few easily 
grasped principles, the analytical application of the theory has evolved over time as 
a constellation of customs and practices. We should look at this in more detail. 

To find pc sets in a musical score you have to divide it up into groups of notes, 
a process known as segmentation. For example, Bryan Simms's segmentation of bars 
20-21 of the last of Berg's Four Songs Op. 2 (Figure 7.1) interprets the piano part 
as a sequence of chords (shown by enclosing the notes in rectangular boxes), and ig- 
nores the voice part altogether. 

Looking at the piano part in this way allows Simms to identify the chords as ex- 
amples, alternately, of set-classes 4-Z29 and 4-Z15 (Simms 1993). Another ana- 
lyst, John Doerksen, finds these set-classes too, but decides in addition that it is 
worth identifying the set-classes 3-8 and 6-Z13 by segmenting the vocal line after 
the first note in bar 21 (Doerksen 1998; see Figure 7.2). He also identifies the larger 




4-Z29 [7.8,10.2] 



4-Z15 11.3,6.7] 



4 -Z29 [5,6.8,0) 4-Z15 [1 1,1.4.5] 4-Z29 [3,4,6.10] 



Figure 7.1. Segmentation and set-class analysis by Bryan Simms of Berg, "Warm die 
Lufte," bars 20-21 (from Simms 1993: 124). 
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Figure 7.2. Segmentation and set-class analysis by John Doerksen of Berg, "Warm die 
Lufte," bars 19-21 (from Doerksen 1998: 199). 



set-class 8-5 that is formed by these two segments taken together, and in a similar 
way he groups two of the adjacent piano chords to form 7-20, while adding the 
notes of the vocal line at this point gives him 8-8. 

The next stage is to take stock of the set-classes and to find relationships be- 
tween them. Forte's theory offers various relationships "off-the-shelf," as it were, 
such as the "Z" relationship that holds between 4-Z15 and 4-Z29-meaning that 
the same distribution of intervals is to be found in these two classes. (For further the- 
ory concerning set-class relationships see Forte 1973, Rahn 1980, and the excellent 
summary in Castren 1994.) 

To summarize: the theory of set-classes is written in formal terms, and repre- 
sents a fixed point in the constellation of theory and methods, whereas the associ- 
ated methods of analysis are varied, and even the most fundamental process — that 
of segmentation — depends largely on personal habits of working at an analysis. As 
a result, although many computer programs have been written that make it easy to 
find the set-class designation that corresponds to a list of pitch names, and to deter- 
mine relationships among the set-classes found in an analysis, these all function 
rather like a pocket calculator; they can save time, but they don't engage directly 
with the business of analysis, and so the empirical engagement with the score re- 



132 EMPIRICAL MUS1C0L0GY 

mains entirely the responsibility of the analyst. At best, however, the time-saving 
capability of computer-based tools can speed up the process of trying out an analyti- 
cal hunch to see whether it "works," then modifying it if it doesn't, then trying again, 
and so on. This back-and-forth motion between the musical imagination and the ev- 
idence of the score itself is highly characteristic of analysis. To the extent that there 
is often a corresponding circularity of reasoning involved in set-class analysis — 
equating relative "success" with a segmentation that results in a smaller, more tightly 
related list of set-classes — it may be argued that using a computer program to find 
out how small and tightly related one's list of set-classes is, offers a genuine enhance- 
ment of the overall analytical process. 

Quasi-Formalized Analysis 

Several authors have moved beyond the scenario described above and have at- 
tempted to capture something of the analytical process itself in quasi-formal terms. 
Some of this endeavor has been pedagogical: in textbooks, the need to convey how 
analysis is done tends to lead to stage-by-stage descriptions (see Forte and Gilbert 
1982, Cook 1987, Dunsby and Whittall 1988). Under the rubric Models of Musical 
Analysis (Everist 1992, Dunsby 1993), a number of writers have gone further in 
providing specific, fully worked, examples of analytical practice. Since these are in- 
tended to stand as models for analyses of works other than those used by way of ex- 
ample, their broad aim is to capture, albeit in particular musical contexts, the con- 
ceptual and operational processes that analysis entails. 

Something similar may be said of the generative theory of tonal music devel- 
oped by Fred Lerdahl and Rayjackendoff in the late 1970s and early 1980s (Lerdahl 
and Jackendoff 1983), an approach developed further since that time in a number 
of publications by Lerdahl (notably Lerdahl 2001). Their system of formal (and in 
some cases informal) rules aims to reproduce the kinds of judgment that human mu- 
sicians make about tonal music of the common-practice period. As they put it, "We 
take the goal of a theory of music to be a formal description of the musical intuitions 
of a listener who is experienced in a musical idiom" (1983: 1; emphasis in the origi- 
nal). This statement tells us that Lerdahl and Jackendoff are making a psychological 
claim about the accuracy of their theory with respect to human intelligent behavior. 
In other words, they aim not merely to come up with a description of the music that 
in some way matches a description that a listener might make, but also to describe 
the way in which the listener might make it — to shed light on the psychological 
processes that the listener might employ, even though to the listener those processes 
might remain no more than felt or tacitly understood. 

Although Lerdahl and Jackendoff 's belief that rule-systems can capture the psy- 
chological processes of an experienced listener should be evaluated primarily on its 
own terms, practical considerations demand that their rules produce tangible out- 
put that can be compared with the sorts of description of a musical score that might 
be made by a human being. And since the tangible descriptions made by humans 
are, de facto, music analyses, it follows that Lerdahl and Jackendoff's rules produce 
music analyses as well. Importantly, however, their system does not replicate exist- 
ing kinds of analytical output — otherwise they would merely be claiming to show 
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the psychological processes that produce, for example, a Schenkerian analysis. In- 
stead, their theory introduces new analytical concepts of its own, but because they 
recognize that various established methods of analysis do express something of the 
musical intuitions of an experienced listener, they appropriate a number of ideas that 
are familiar to analysts. 

Lerdahl and Jackendoff's theory is set out in four sets of "well-formedness rules" 
which model different types of musical thought. These four limbs of the theory are: 
(1) "Grouping Structure," which describes how the mind articulates a chronological 
stream of musical events by associating events into groups that seem to belong to- 
gether in some way; (2) "Metrical Structure," which they describe as "the regular, hi- 
erarchical pattern of beats to which the listener relates musical events" (1983: 17; 
see Cooper and Meyer 1960); (3) "Time-Span Reduction," which is based on the 
grouping and metrical structures but is expressed reductively that is, by selecting 
within each articulated time-span an event that is considered the most important; 
and (4) "Prolongational Reduction," which is based on Schenkerian principles. To 
give a flavor of the theory, the following are the first and last Grouping Structure 
well-formedness rules: 

GWFR1 

Any contiguous sequence of pitch-events, drum beats, or the like can consti- 
tute a group, and only contiguous sequences can constitute a group. (Lerdahl 
andjackendoff, 1983: 345) 

GWFR5 

If a group Gl contains a smaller group G2, then Gl must be exhaustively par- 
titioned into smaller groups. (Lerdahl andjackendoff, 1983: 345) 

And the following are the first and second Prolongational Reduction well-formedness 
rules: 

PRWFR 1 

There is a single event in the underlying grouping structure of every piece that 

functions as prolongational head. 

PRWFR 2 

An event e, can be a direct elaboration of another event e. in any of the follow- 
ing ways: 

a. e. is a strong prolongation of e. if the roots, bass notes, and melodic notes of 
the two events are identical. 

b. e. is a weak prolongation of e if the roots of the two events are identical but 
the bass and/or melodic notes differ. 

c. e is a progression to or from e. if the harmonic roots of the two events are 
different. 

(Lerdahl andjackendoff, 1983: 351) 

In addition, there is for each of these four systems an accompanying set of "pref- 
erence rules" which describe how any conflicts within, or between, these four types 



134 EMPIRICAL MUS1C0L0GY 

of judgment might be resolved. These rules are expressed rather differently from the 
"well-formedness rules," because they don't presume to operate the same way in 
every musical situation: their application is not automatic, but depends on a judg- 
ment of the musical context. For example, the first of the preference rules relating to 
Grouping Structure is expressed as follows: 

GSPR 1 Avoid analyses with very small groups — the smaller the less prefer- 
able (Lerdahl and Jackendoff 1983: 345) 

Notice how this rule talks of "analyses"! It does not take the form "Intuitions about 
Grouping Structures prefer . . .," despite the authors' overall intention to describe 
musical intuitions. Of course, it might be possible to rephrase all the preference rules 
in the second way, but that is not the point I want to make. It is rather that the dis- 
tinction between well-formedness rules and preference rules corresponds to a dis- 
tinction one can make about the application of music theory in practical analysis: 
the well-formedness rules describe those overlearned psychological routines that are 
applied intuitively, whereas the preference rules describe the areas of judgment that 
a self-aware analyst will handle explicitly through conscious thought. One might 
argue about exactly where the line is drawn — indeed, it may depend very much on 
the individual analyst's capacity to avoid making unwarranted assumptions — but 
the line is there to be drawn nonetheless. 

What does all this mean in practice? Figure 7.3 shows a composite analysis by 
Lerdahl and Jackendoff of the opening of Mozart's 40th Symphony, shown in re- 
duced score with bar numbers (Lerdahl and Jackendoff 1983: 259). The metrical 
structure is shown directly below the score notation by combinations of dots placed 
under the main beats: the more dots there are under a beat, the greater its metrical 
stress. The horizontal brackets underneath the dots show the grouping structure. 
Note how the groups are organized hierarchically in layers of brackets: for example, 
the lowest bracket at level (a) extends notionally well beyond the extract analyzed 
here, and at the next largest level (b) the time-span covers all the music up to the 
point where the opening material recurs; the two time-spans at level (c) lie exactly 
within (b), and so forth. The tree-diagram above the score and the notational sys- 
tems below the grouping brackets together express the time-span reduction. The 
prolongational reduction (Figure 7.4; Lerdahl and Jackendoff 1983: 260) is derived 
in a further stage of analysis. 

Lerdahl and Jackendoff's system charts a process leading from the score as an 
uninterpreted input through to the production of a structural description. This de- 
scription both expresses an interpretation of the score and traces the process through 
which the interpretation comes into being. All of this arises not through psycholog- 
ical experiment or automated data processing of the score, but by contemplation and 
working-out: what emerges, then, is an acutely self-aware form of analysis. And just 
as is the case with set-class analysis, the empirical aspects are explicitly separated 
from the rational (i.e., logically consistent) aspects. Lerdahl and Jackendoff them- 
selves address this issue in terms of a distinction between necessary and sufficient 
conditions: 
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For example, there are no necessary and sufficient conditions for a portion 
of the musical surface to be judged a group. The grouping well-formedness 
conditions are necessary conditions on groups, but not sufficient. Each pref- 
erence rule is an attribute that creates family resemblances among grouping 
structures, but since every preference rule can be overridden by the proper 
confluence of circumstances, it is a sufficient condition only in the absence 
of conflicting evidence. Where preference rules come into conflict, dubious 
judgments of grouping result; where a great number of preference rules re- 
inforce each other, a stereotypical grouping structure results. (1983: 313) 

In other words, the rational, logically consistent part of their theory says a great 
deal about groups but cannot, of itself, find them in the music. 

The similarity between this state of affairs and the status of segmentation in the 
set-class theory/analysis mix suggests that it might be possible to go beyond the ap- 
parently ad hoc nature of set-class analytical practices by framing some preference 
rules, in Lerdahl and Jackendoffs sense, that describe how set-class analysis is done. 
To do this, we might look at the kinds of criteria that various writers on set-class- 
based analysis have come up with, and rephrase them as preference rules. The first 
of these paraphrase Fortes own criteria: 

1. Prefer segments that have a beginning and an end, both of which are deter- 
mined in some way, for example, by an instrumental attack or a rest. (cf. 
Forte, 1973: 84) 

2. Prefer segments that correspond to sets identical with or related to those of 
other segments, (cf. Forte 1973: 85-88, 91-92 and passim) 

3. Prefer to designate a configuration that is isolated as a unit by conventional 
means, such as: (a) a rhythmically distinct figure; (b) something indicated 
by a notational feature, such as a rest or a beam; (c) a chord; (d) an ostinato 
pattern. (1973: 83) 

4. Prefer vertical groupings through the entire texture, (cf. Forte 1973: 89) 

5. Prefer to form composite segments from subsegments that are contiguous or 
linked in some other way. (cf. Forte 1973: 84) 

6. Prefer composite segments that do not extend across a rest in all parts, (cf. 
Forte 1973: 90) 

7. Prefer a segmentation based on knowledge of a particular composer's way of 
composing, (cf. Forte 1973: 92; and Forte 1972) 

From Joel Lester (1989: 89-90): 

8. Prefer groupings of pitches that are important to the sound of the passage. 

9. Prefer segments that enhance our hearing and understanding of the piece. 

10. Prefer segments formed of pitches that appear together: (a) consecutively as a 
melody; (b) simultaneously as a harmony; (c) associated texturally or timbrally 
as in the accompaniment to a melody; (d) related in some other way. 

And from Bryan Simms (1993: 127): 

1 1 . Prefer segmentations created by rhythm and meter. 

12. Prefer segmentations created by the presentation of notes as chords. 

13. Prefer segmentations created by the placement of rests. 
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Figure 7.3. Grouping structure, metrical structure, and time-span reduction of Mozart, Symphony No. 40 in G minor, bars 1-22 (from 
Lerdahl and Jackendoff 1983: 259). 
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Figure 7.4. Prolongational reduction of Mozart, Symphony No. 40 in 
G minor, bars 1-22 (from Lerdahl and Jackendoff 1983: 260). 



14. Prefer segmentations created by groupings of notes under a slur or phrase 
mark. 

15. Prefer segmentations created by motMc elements. 

16. Prefer segmentations created by disjunctions in register or color. 

These formulations are generally less specific than are Lerdahl and Jackendoff 's 
for tonal music, but in substantive terms there is a great deal of common ground 
among these three (and other) authors. And, like the Lerdahl/Jackendoff rules, these 
rules do not simply take, as their inputs, the results of well-formedness rules oper- 
ating in another part of a broader musical system, while passing on nothing in re- 
turn. Instead, they form part of a network, feeding their own results into preferen- 
tial judgments being made elsewhere. To put this another way: the things that Forte's 
theory of set-classification leaves out — rhythm, register, dynamics, instrumenta- 
tion, the words in a vocal piece, and so forth — still feed into pc set analysis through 
the practical business of segmentation. 
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Modeling Analytical Practice with Formal Systems 

So far, we have examined ways of working, and complexes of ideas, in which there 
is a clear methodological gap between the uninterpreted contents of a musical score 
(something we might think of as "data") and the application of rules in relation to an 
interpretative process that we call "analysis" — which in its classic form is under- 
taken through musical contemplation. We have seen examples of formally expressed 
music theory, and of the quasi- formal description of how an analyst engages with the 
score, but we have not yet considered any work that attempts to capture the pro- 
cesses of music analysis themselves in terms of genuinely formal rules. It is perhaps 
not surprising to find that the discipline that initially pushed hardest to achieve this 
was artificial intelligence (AI) research, for in this field the idea that a musical score 
might be treated as data, and that empirical processing of it might be accomplished 
automatically, was akin to the way researchers conceived the classic domains of AI: 
natural language, visual recognition, and complex gaming (such as chess). 

One of the most notable attempts to use the early digital computer for music 
analysis was undertaken by Terry Winograd (1968). Winograd, whose reputation 
was made in the field of artificial visual intelligence, took the classic AI approach of 
modeling: that is to say, his computer program was not designed to produce a new 
kind of music analysis, but to produce by artificially intelligent means the kinds of 
analysis that human beings could already make. In common with a great deal of later 
similar research in the same vein, Winograd's work was based on a grammatical 
framework, in his case the "systemic linguistics" of M. A. K. Halliday: 

Systemic grammar is based on the fact that any sentence in a language will 
exhibit a number of features which the speaker has selected from a limited 
and tightly organized set available in that language. What is important 
in the understanding of a language is the way in which these features are de- 
pendent on each other. ... A feature T 1 may be conditional on a feature T2 , 
in that the presence of Tl implies the presence of T2 . . . [or] may on the 
other hand preclude the possibility of T2. Thus, "interrogative" and "declara- 
tive," or "triad" and "7th" are incompatible pairs. . . . Further, a set of mutu- 
ally exclusive features may exhaust the possibilities for an obligatory property 
of the sentence. . . . Such sets are called systems and form a key part of the 
theory. . . . The diatonic note system in music has seven terms (A, B, . . . G) 
while the Linearity system used in this grammar has six: (passing, anticipa- 
tion, suspension, auxiliary, contained in adjacent chord, and nil). (Winograd 
1968: 9) 

In other words, one component of the grammar embodies a rule that a note, if it is 
implicated in a linear relationship, must do so in one and only one of the six ways 
listed. 

Winograd organized his grammar into five "ranks": Composition, Tonality, 
Chord Group, Chord, and Note. His diagram of the "network" of systems at the rank 
of Chord is shown in Figure 7.5. The program, written in LISP, operated by parsing 
the music in retrograde motion, a procedure chosen because the function of a chord 
in tonal music is very often governed by the chord that precedes it. 
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Figure 7.5. Systemic chord grammar (from Winograd 1968: 17). 



A journey through the music in chronological sequence "from left to right" 
raises a number of possible expectations at many given moments (see Meyer 1956, 
1973; Narmour 1977, 1990, 1992), all of which would have had to be kept track of 
by Winograd's program, whereas going through the music "from right to left" meant 
that the program had fewer possible paths to calculate because it had parsed the res- 
olution of a given harmony before coming to the harmony itself. (We should note, 
incidentally, that presenting the musical data to the program as a sequence of chords 
may involve an act of segmentation on the part of the researcher.) The program pro- 
duced impressive results, comparable with analyses that might have been produced 
by human analysts (see Figure 7.6). The fact that it did so, despite operating in a 
counter-intuitive "right to left" manner, illustrates how the modeling approach is 
concerned fundamentally with the plausibility of the end product as an imitation, 
rather than with replicating psychological processes (at least those of listeners). 

Winograd's pioneering example established that the modeling approach is vi- 
able for simple tonal analysis. What are the prospects for applying this approach to 
set-class analysis? As we have seen, in this case it is the business of segmentation that 
is the obstacle, because it seems amenable only to semiformal description at best. To 
show this, it is instructive to look at the shortcomings of a more ambitious attempt 
to automate the segmentation process, the rule-based system developed by James 
Tenney (1980). Among the pieces he worked on were two for solo flute: Debussy's 
Syrinx (1912) and Varese's Density 21.5 (1936). The monophonic nature of these 
pieces immediately reduced the complexity of the segmentation process: in effect the 
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Figure 7.6. Analysis of "Puer natus in Bethlehem" generated by 
Winograds grammar (from Winograd 1968: 41). 



segmentation needed only to be "vertical" — as it also was in the case of Winograd's 
chord parsing program — and the complexities of atonal contrapuntal texture were 
thus avoided. Tenney's approach, then, addressed some but not all of the issues fo- 
cused in the preference rules for atonal segmentation listed earlier in this chapter. 

Tenney's vertical segmentation divided the music into time-spans — which he 
termed temporal gestalt-units (TGs) — by giving specific weighting to quantifiable 
factors, such as rests, dynamics, and large intervals. His model produced hierarchi- 
cal segmentations that reflected the grouping of groups, the grouping of groups of 
groups, and so forth. A fundamental principle he applied was that "The perceptual 
formation of TGs at any hierarchical level is determined by a number of factors of 
cohesion and segregation, the most important of which are proximity and similar- 
ity; their effects may be described as follows: relative temporal proximity [and] rel- 
ative similarities of TGs at a given hierarchical level will tend to group them, per- 
ceptually, into a TG at the next higher level. Conversely, relative temporal separation 
and/or differences between TGs will segregate them into separate TGs at the next 
higher level." (1980: 208) 

Tenney went on to develop measures of proximity in four domains: duration, 
pitch, intensity, and timbre. These measures were then assigned weightings — so 
that, for example, it could be determined whether a boundary existed between two 
notes that were close in temporal terms but distant in terms of pitch, or whether a 
sudden fortissimo would create a segment boundary even when there was little dif- 
ference between the notes concerned in terms of, say, pitch and timbre. Thus the sys- 
tem of weights operated in lieu of an array of preference rules. In fact, Tenney used 
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different weighting factors in different analyses, adjusting the computer implemen- 
tation (which was written by Larry Polansky) so as to produce the "best" (i.e., pre- 
ferred) results in each case (1980: 219-220). 

The beginning of Tenney's segmentation of Varese's Density 21 .5 is shown above 
the flute stave in Figure 7.7. This may profitably be compared with a segmentation 
of the same piece by Jean-Jacques Nattiez, originally published in French in 1975 
(see Nattiez 1982), which Tenney shows below the stave. Nattiez was an important 
figure in the semiology of music (see Nattiez 1975, 1990), and his segmentation was 
undertaken as a preliminary stage in a grand comparative study of analyses of the 
work. The idea was that the initial segmentation could be undertaken without preju- 
dice as to musical signification, so that by comparing other analyses with this seg- 
mentation their underlying priorities of interpretation could be exposed. The major 
flaw in this argument, of course, is that segmentation simply can't be done in such a 
"neutral" way by an experienced musician. For example, the fact that Nattiez could 
read music meant that he was bound to make some kinds of judgment about musi- 
cal signification on the basis of the notation — indeed, perhaps the best he could 
hope for was to suspend some kinds of judgment and hope that people would trust 
him. Still, such relativism is something that circumstances impose on semiology as 
a whole, and it doesn't invalidate that discipline any more than Einsteinian rela- 
tivism invalidates physics. 

Tenney's discussion of the difference between their two analyses hinges on a 
concern that the weightings between parameters may need further adjustment: "dis- 
crepancies . . . remain which suggest that our weightings may not be quite 'opti- 
mum' after all, or that they are simply different from those unconsciously assumed 
by Nattiez. . . . Finally, however, I must say that I think our segmentation represents 
the perceptual 'facts' here more accurately than Nattiez's at certain points" (Tenney 
1980:221). 

For the English translation of his analysis, Nattiez prepared a rebuttal of Ten- 
ney's points, largely in order to defend his own segmentation (Nattiez 1982: 
324-329). But, from the point of view of anyone wanting to refine Tenney's ap- 
proach to modeling segmentation, the most telling point that Nattiez makes is that 
the balance of weightings between parameters should be flexible not merely from 
one piece to another but also within the boundaries of the individual piece. This is 
tantamount to saying that the system of weightings is not sufficiently complex, or 
perhaps not sufficiently multileveled, to model the kinds of perceptual judgment 
that actually take place within specific contexts. Given my argument that the weight- 
ings in Tenney's system actually fulfill the same function as a complex of preference 
rules, the question arises as to whether either approach (or a combination of them) 
is an adequate basis for automated replication of the complexities of musical seg- 
mentation. 

However, there is a significant alternative to both of these. The use of neural net- 
works (or "nets") as a tool for AI modeling of pattern matching quickly gained the 
reputation of being a stronger way of producing artificially intelligent machines than 
rule -based systems. The term "neural network" refers to the intention to make such 
a system akin to the brain in terms of its underlying organization, but without claim- 
ing that the way in which an artificial neural network behaves in performing a par- 
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Figure 7.7. Segmentations by James Tenney and Jean-Jacques Nattiez of Varese, Density 21.5 (from Tenney 1980: 222). 
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ticular task is, at the level of its quasi-neuronal "circuitry," the same as the behavior 
of an organic brain accomplishing the same task. Motivating this approach — which 
was developed more or less independently by researchers in the United States, Eu- 
rope, and Japan in the 1970s — was the desire to make machines that could learn. 
In its most basic form — actually, too basic to be useful except as an introductory de- 
scription — a neural network comprises a group of input nodes connected to a group 
of output modes. For a simple musical application, there might be 12 input nodes 
corresponding to the 12 pitch classes, and a number of output nodes representing 
common chords. A dataset, for instance a chorale harmonization, could be used as 
input material to this: the likely imbalance among the pitch-classes — since chorales 
tend not to be made up of 12-tone rows! — would mean that some input nodes were 
"stimulated" more than others. As a result, the strength (measured by a variable nu- 
merical value) of the connections between those nodes and the corresponding out- 
put nodes would be enhanced and others reduced. Overall, then, the network would 
in some sense have "learned" through its "experience" of the data. This outline de- 
scription only hints at the practical complexities of neural networks used by re- 
searchers, for example, in terms of "hidden" layers of nodes, nodes that inhibit as 
well as those that excite, and the classic "back-propagation" method that is intended 
to improve the performance of the system as a whole. 

One of the questions that has bedeviled research using neural networks is: 
When the network learns, just what does it learn? The question arises because the 
learning is not set out explicitly in terms of rulelike descriptions of responses to cer- 
tain types of input ("if you give it an X, then the network does such and such"); in- 
stead, the network's learning is embedded in its changed patterns of connection. Al- 
though methods exist to unravel the network patterns so that they can be usefully 
examined, there remains a difference in kind between these declarative and proce- 
dural descriptions of intelligent behavior. Since music theory is almost exclusively 
expressed in rule-based, declarative terms that encapsulate overt knowledge, the 
procedural learning that is accomplishable by neural networks might seem com- 
paratively unpromising in this field. Rather, one might expect to use neural networks 
to model the kind of high-level musical processes that experts seem to find easier to 
do than to explain: musical composition, for example, or interactive performance 
between human musicians and computers (Rowe 1993: 229-237). Alternatively, 
in line with the classic use of neural networks in artificial visual intelligence, we 
could see their musical niche in modeling low-level psychoacoustic phenomena. 
Marc Leman embraces this idea, and discusses the distinction that it implies between 
symbolic and subsymbolic mental processing: "The subsymbolic representation 
assumes that there is more to mental representation than just symbols. For example, 
when we imagine a chord we do not recall the symbols, but we hear it internally. . . . 
Since there is no consensus on how a musical image can be described in a non- 
symbolic way, the definition of subsymbolic representation is diffuse from the begin- 
ning. Initially, we would define it as the level between the acoustical and the symbolic 
representation . . . ." (Leman 1993: 132; see also Leman 1992) 

The trouble is, Leman is clearly trying to define something that, properly speak- 
ing, is indefinable — and indeed this difficulty signals a major issue that emerges in 
the attempts of various writers to assess how neural networks can, or should, be ap- 
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plied to musical research. One characteristic of a neural network is that it remains to 
some extent inscrutable: but does this imply that neural network modeling should 
be restricted to musical phenomena that are also more or less inscrutable? Or would 
this simply amount to a confusion of the ends with the means? The music psychol- 
ogist Jamshed Bharucha has adopted a median position, by using used neural net- 
works to model the kinds of chord and key recognition that one finds in basic music 
theory, and he helpfully contrasts neural network models with symbolic grammars: 
"I suspect . . . that although highly trained musicians may use formal symbolic 
processes together with a host of other processes, the passive processing of music by 
most listeners is minimally symbolic. What then does one make of rule-based theo- 
ries of music, such as that of Lerdahl and Jackendoff (1983)? These can be construed 
as formalizations of constraints on neural processing of music. In other words, either 
neural nets are implementations of grammars, or grammars are formal descriptions 
of neural nets." (Bharucha 1999: 436) 

A harder line, however, is taken by John Rahn (1994), who — presumably on 
the basis that present-day computers cannot avoid using symbolic data — has argued 
that even a neural network that models the perception of pitch from acoustic data 
"does nothing but process representations." Rahn maintains that, despite the in- 
scrutability of such a network as compared with a formal description of symbolic 
relations — and despite the fact that the network is modeling something that lies in 
the realm of subconscious mental processing — "this net is doing symbolic process- 
ing, and no processing model could do otherwise" (Rahn 1994: 232). Indeed, as 
Rahn goes on to indicate, the work of Robert Gjerdingen (1990) shows that neural 
networks can with advantage be put to work on musical data that are unambigu- 
ously symbolic. 



Beyond Modeling? 

Gjerdingen's work differs from the modeling approach in that, while the focus re- 
mains the end product, it is a product that — at last, one might say! — has the poten- 
tial to "tell us something about the piece," or at least about a style. In line with his 
larger research interest into the usage of specific pitch schemata (small-scale patterns) 
in the classical style (see Gjerdingen 1988), Gjerdingen set up a self-organizing neu- 
ral network that would take as its input some simple keyboard pieces in this style, 
written by Mozart as a child, and to give as its outputs 25 (as yet unforeseen) 
schemata. The input data were organized on an event-by-event basis, and included 
representations of the pitches and contours of the melody, bass, and inner voices, to- 
gether with simulated short-term memory traces of recently preceding events: "After 
the pieces had been taught to the network 12 times . . . and after [it] had had a 
chance to make nearly 10,000 successful categorizations, a stable category structure 
emerged. . . . When learning began [the output layer] was inchoate. Then it slowly 
organized itself, by itself, as it encountered [input-layer] patterns that bore family re- 
semblances to each other" (Gjerdingen 1990: 357). 

Gjerdingen's interpretations of these stable output categories is shown in Fig- 
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ure 7.8, in which he translates the numerical data into musical notation. The pat- 
terns do not match high-level music theory concepts such as cadence types, al- 
though they did encourage him to suggest with confidence a small amendment to 
the New Mozart Edition. Adding two further levels to the network, however, en- 
abled it to recognize cadence types (1990: 365-366). The results of this incremen- 
tal approach are interesting with respect to our criteria on music analysis: the emer- 
gence of "cadence detectors" at the fourth level validates the expanded network as a 
model, which is reassuring but "tells us" little that we didn't already know about the 
music; the unfamiliar concepts that emerge from the simpler two-level network, on 
the other hand, usefully reveal aspects of the music that are unfamiliar in terms of 
explicit theory. 

The question of complexity versus simplicity in music theory is particularly rel- 
evant to Eugene Narmour's "implication-realization" theory of melodic structure 
(1990, 1992). His approach is through feature analysis, and the complexity of his 
approach can be gauged by the fact that so many categories are invoked that the 
mnemonics referring to them are continually repeated for the reader's convenience 
at the foot of the page in his two books on the subject. In fact, through empirical 
testing and statistical analysis, Glenn Schellenberg (1997) found that the five under- 
lying principles of Narmour's theory could be satisfactorily reduced to two, pitch 
proximity and pitch reversal; each applies to a third note following what is termed 
an implicative interval. The pitch proximity principle "states that when listeners hear 
an implicative interval in a melody, they expect the next tone to be proximate in pitch 
to the second tone" (1997: 309); the pitch reversal principle applies more strongly 
to large implicative intervals than to small ones, and predicts that "listeners often ex- 
pect the [third] tone ... to be proximate to the first tone of the implicative interval" 
as well as to the second (1997: 312). In other words: listeners expect that melodies 
will be generally conjunct, that leaps will tend to be small, and that a larger leap will 
generally be followed by a smaller interval in the opposite direction 1 — findings that 
are very much in line with the results of Knud Jeppesen's pioneering study of Palest- 
rina's style (1927), which is itself a classic example of empirical musicology All this 
seems at first sight to undermine Narmour's work considerably, both by characteriz- 
ing it as overdetermined and, though Schellenberg doesn't explicitly make this 
point, by showing that it has merely the predictive power of a model rather than the 
enlightening potential of good analysis. In the light of Gjerdingen's work, however, 
one might wonder whether the very over-determination of Narmour's theory could 
put it into the same category as Gjerdingen's initial, two-layer network, so that the 
fact that its categories are at a lower level than those of Schellenberg's revision might 
actually make them more interesting to use in the analysis of specific melodies. This 
is likely to depend on whether it turns out that Narmour has in fact made explicit 
some useful concepts that conventional, high-level melodic theory normally glosses 
over. Time will tell: an alternative possibility is that they will be thought no more 
worthwhile than Babbitt's equations about serial rhythm. 

Encouraging though Gjerdingen's work is, it is still — to follow Cook's distinc- 
tion — an approach that looks at individual pieces in order to deduce "more general 
principles of musical structure": in other words, a tool for theory-building rather 
than analysis. A recent line of research that is genuinely analytical in orientation, 
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Figure 7.8. Pitch/contour schemata recognized in early Mozart by Robert O. 
Gjerdingen's ART pour I'art neural network (from Gjerdingen 1990: 360; "The largest 
noteheads indicate the strongest pitch traces, arrows signify traces of contour, 'd' 
means the trace of a contrapuntal dissonance, and 'TT' signifies a harmonic tritone." 
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however, and that also reduces the "gap" between uninterpreted score data and the 
input to a formalized empirical system, is David Temperley's work in the field of au- 
tomated harmonic analysis (Temperley 1997). The system produced by Temperley 
and his coworker, Daniel Sleator, is intended to be capable of analyzing common- 
practice tonal music in terms of triadic harmonies and seventh chords (its analysis 
of the melody "Yankee Doodle" is reproduced in Figure 7.9). 

In doing so, it does not rely on human judgments about segmentation, but can 
judge for itself where each new harmony begins, based on five preference rules: 

Pitch Variance Rule: Try to label nearby pitches so that they are close 

together on the line of fifths. . . 
Compatibility Rule: In choosing roots for chord spans, prefer certain . . . 

root relationships over others. Prefer them in the following order: 1, 5, 3, 

b3, b7, ornamental. (An ornamental relationship is any relationship 

besides these five.) . . . 
Strong-Beat Rule: Prefer chord spans that start on strong beats of the 

meter. . . . 
Harmonic Variance Rule: Prefer roots that are close to the roots of nearby 

segments on the line of fifths. . . . 
Ornamental Dissonance Rule: An event is an ornamental dissonance if it 

does not have a chord-tone relationship to the chosen root. Prefer 

ornamental dissonances that are closely followed by an event a step 

[whole tone] or half-step [semitone] away in pitch height. (Temperley 

1997: 49-54) 

The system is presented as being explicitly algorithmic: Temperley does not make 
psychological claims for it, beyond noting that "much evidence exists that harmonic 
analysis is performed by trained and untrained listeners during listening" (1997: 
3 1). He also draws attention to the fact that, in waiting until the entire piece has been 
processed before assigning an analysis to it, the algorithm is "clearly unsatisfactory 
as a model of listening" (1997: 58). But then, to produce a model of listening was 
not his intention. 

The more important question is: is this a model of analysis? Or does it actually 
analyze, in the sense of satisfying our earlier criteria? The fact that the system 
processes the entire piece before committing itself does not disqualify it from being 
a model of analysis — far from it. Indeed, its assignment of chord roots and judg- 
ments of segmentation seem exactly designed to model basic analytical processes. In 
particular, by accomplishing segmentation in an automated way — and far more 
convincingly than Tenney did — Temperley and Sleator have made an important 
contribution. (We should note that metrical data still have to be supplied to the pro- 
gram, though this is often in itself an automatic process of transcription.) However, 
there is then the question of what these analyses "tell us about the piece": do they 
have the power to change how we hear the music? In this regard I think there is less 
cause for enthusiasm, for the system is severely limited both in stylistic terms and in 
terms of the analytical results it presents. What is more, as Temperley candidly 
points out, when the system's judgments differ from those that he himself might 
make, these are not really intriguing new interpretations but, instead, "questionable 
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Figure 7.9. Harmonic analysis of "Yankee Doodle" produced by 
Temperley's preference rules as implemented in Sleator's program 
(from Temperley 1997: 63). 



choices," a "definite mistake," "problematic," or even "completely wrong" (1997: 
63-4). This system nonetheless has great potential for further development (see 
Temperley and Sleator 1999, Temperley 2001): what it perhaps most needs is to en- 
compass a greater range and complexity of musical judgment while retaining the 
most admirable of its present features. 

Some recent work by the present author can claim at least to meet the first of 
these desiderata. The "Tonalities" project (Pople forthcoming) is focused on the so- 
called "breakdown" of tonality around 1900: in contrast to this traditional view, it 
claims that "tonality" should not be viewed as a more or less fixed system, but means 
something different in, say, middle-period Debussy than in late Wagner — and dif- 
ferent again in late Mahler, early Schoenberg, Rachmaninoff, Sibelius, Busoni, 
Strauss, Vaughan Williams, Ives, Gershwin, and so forth. I argue that even where 
such music is regarded as tonal, but with exceptional features that require comment, 
treating these different musics as special cases endangers the working link between 
theory and analysis — so that one might ask whether it is tonal analysis, rather than 
tonality, that is breaking down. 

The system provides the capability for handling a wide diversity of tonal sys- 
tems by means of a theoretical framework that uses explicit theoretical definitions 
and analytical procedures, but is also highly configurable: not all features are neces- 
sarily active for every analysis, and some features can be fine-tuned to provide fur- 
ther flexibility within well-defined constraints. This work draws on a range of tonal 
theories, including neo-Riemannian theory (Cohn 1995, 1997, 1998; Krumhansl 
1998), and also uses some aspects of set-class theory. Although focused on the diver- 
sity of tonalities around 1900, it can handle both Bach chorales and "atonal" music 
that is texturally similar to tonal models. The complexity of the theory is such, how- 
ever, that its analytical application is only practicable using a computer: thus a large 
degree of automation is involved. The essence of the system in practice is that the an- 
alyst defines a tonal system by selecting from a range of detailed options (Language 
Settings); then, taking as its input a simplified score representation of the music, the 
computer software follows through the analytical implications of these choices, using 
a raft of analytical procedures — themselves numerous and complex — designed to 
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produce communicable output that may lead the analyst to refine the choice of Lan- 
guage Settings. This process of exchange between human and machine typically 
continues until the analyst is satisfied both that the analysis produced by the soft- 
ware satisfactorily matches his or her detailed judgments about the specific piece or 
extract being analyzed, and that the Language Settings satisfactorily match his or her 
conception of the music at a stylistic level: in a sense, the determination of the most 
appropriate Language Settings constitutes the most important outcome of the ana- 
lytical process. At the same time, and importantly in the present context, this itera- 
tive process is liable to include phases where the software introduces possibilities 
that the analyst may not have considered, but which nonetheless appear on reflec- 
tion highly plausible. In this sense, the Tonalities system has the potential to satisfy 
our basic criteria on analysis, because it can change the way the analyst hears and 
thinks about the piece, as well as helping to make explicit the criteria underlying ap- 
parently intuitive judgments. 

The program is implemented in the form of an extensive Visual Basic "plug-in" 
for Microsoft Excel, with the simplified score representation taking the form of a 
spreadsheet: pitch data are supplemented by indications of metrical stress (as in 
Temperley's system), and by a "vertical" segmentation into harmonic areas. (In the 
latter respect, the Tonalities system leaves a wider "gap" between the uninterpreted 
score and the input data than Temperley's does; in compensation, its stylistic range 
and complexity of analytical judgment are considerably greater.) An analysis of a 
striking but harmonically complex passage from Schoenberg's tone poem Pelleas 
und Melisande is shown in Figures 7.10-11: the repeated two-bar figure is seg- 
mented at the half-bar level, and the analysis consists of reports on each of the four 
segments, followed by a summary. The segments are analyzed by the software in 
chronological order, with diminishing memory traces of earlier segments being 
maintained (as in Gjerdingen's work), but without looking ahead beyond the imme- 
diate segment boundary. As Tonalities works through the segments, the immediately 
preceding segment may be re-evaluated in retrospect, but this would only be done 
in order better to "understand" the segment currently being analyzed — the report 
on that earlier segment, once issued, is not revised. 

The Tonalities software comes with a large library of built-in gamuts (scales, 
modes and the like) and chords, and it is possible to customize it by adding new 
ones. When the software analyzes a segment, it matches the pitches and texture of 
the segment to as many as possible of the chords and gamuts selected in the Lan- 
guage Settings. It assigns a rating to each of these, then pairs each possible chord 
with each possible gamut — a chord/gamut pair is at this stage termed a prolonga- 
tion — and assigns a composite rating to each of these pairs. Next, it filters the list of 
possible prolongations by applying a range of tests, comparisons, and preferences, 
until there emerges a single best prolongation (or a few equal-best prolongations), 
which it reports: in the text of the segment reports (see Figure 7.11), the prolonged 
chord and the prolonging gamut are the components of the reportable chord/gamut 
pair. It is important to note that the prolonging gamut is not an identification of 
"key," but applies only within the segment, reflecting the motions between chord 
notes and nonchord notes. Similarly, the Chord function within segment is, as the name 
implies, a within-segment relationship, rather than a judgment in relation to some 
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sort of "key" that applies in a notional larger context. In order to reinforce the point 
that many of the system's judgments are based on criteria such as set membership, 
inclusion and exclusion, the Pitch-class content of the segment is reported, using set- 
class names and pc integer notation. Next comes a report on the dissonance analysis 
that Tonalities undertakes as a focal stage of the process of prolongation filtering: this 
part of the report serves to relate every nonchord note to one or more chord notes. 

From the second segment onward, the analytical reports include information 
about how the segment-to-segment transition has been analyzed. Two lines are added 
to the report, showing the link between the two chords in terms of a connective gamut 
and either a chord progression, root movement, trichord distance, or a count of common 
tones. (As with the prolonging gamut, the connective gamut should not be confused 
with conventional judgments of key, because it has a specific meaning here: it is a 
gamut that encompasses salient subsets of the two chords.) Tonalities reports the 
link from one chord to the next as a chord progression only if the connective gamut 
and both chords are functional: otherwise, it tries to report the root movement in re- 
lation to the prolonging gamut from one or other of the segments, if necessary in- 
voking the functional association(s) of a nonfunctional gamut (e.g., the association 
of the octatonic collection with dominant-quality chords, as in the report on seg- 
ment 4 in Figure 7.11). If the appropriate conditions apply, it will report the motion 
as a trichord distance in terms of the group-cycles established by Neo-Riemannian 
theory, as again shown in the report on segment 4. Alternatively, if it cannot do any 
of these things, it reports the number of common tones held between the salient sub- 
sets of the two chords (see the reports on segments 2 and 3). As the report on seg- 
ment 3 shows, Tonalities is prepared to look for diminished-fifth substitutions — and 
indeed other more extreme examples of function substitution — depending on how 
far the Language Settings depart from the default "common-practice" configuration. 

It is an important aspect of the system that chord types and gamut types can be 
selected or deselected from the Language Settings, and certain of their properties 
changed, as the user thinks fit. For example, if the "major triad" chord type is dese- 
lected then the software won't recognize any major triads. When it starts an analysis 
the software makes a summary assessment of the user's choice of Language Settings, 
and this affects much of its decision making, particularly during the process of pro- 
longation filtering. In effect, the system amplifies the analyst's hunches about the 
piece and allows them to be successively evaluated and refined. Among early users 
of the system (Nicholas Cook, Jonathan Dunsby, and Michael Russ), a point of de- 
bate concerned the extent to which one has to understand the inner workings of 
the software to get the best from it in terms of this dialogue. At the time of writing, 
the best answer to this question seems to be that the software implements an explicit 
body of music theory, so that (as with any other theory) expertise will reap rewards; 
but at the same time the operational component of the theory is of such complexity 
that probably no user (and this includes the author) can expect always to second- 
guess the analysis that will emerge from challenging musical situations. And since 
the empirical motivation for analysis constantly drives one to understand just such 
challenging music, the Tonalities system can be said to have the potential to partic- 
ipate, albeit as the junior partner, in an expert-to-expert dialogue about the analysis 
of specific pieces. 2 
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Figure 7.10. Schoenberg, Pelleas und Melisande, rehearsal no. 59, bars 1-4. 



Prospects 



At the beginning of this chapter I observed the importance of individual thought and 
judgment in analyzing music — this despite the fact that analytical writing addresses 
a community on the basis of shared perceptions and musical thought processes. One 
of the many corollaries of this dualism is that reading analysis is an individual mat- 
ter too, just as writing analysis is. Habitual readers of analysis will also have in mind 
their own individual interpretations of the works they're interested in from this point 
of view — probably not written down, perhaps not even fully thought through, but 
sufficient to subject the analyses they read to certain criteria of usefulness. In short: 
analysis must provide new insights, and written analyses are the currency in which 
these insights are traded. 
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Schoenberg, Pelleas und Melisande, final section 
Segment 1 (verticals 1 to 8) 

Prolonged chord: minor-major seventh on Eb [Eb Gb Bb D / F Ab C] 

Prolonging gamut: Eb melodic minor scale [Eb F Gb Ab Bb C D] 

Chord function within segment: I 

Pitch-class content: 7-34 (t=2) [02356810] 

F4 [6, 4] as P between Eb4 [6, 3] and Gb4 [6, 5] 

C7 [1,5] as N after D7 [1,1] 

Ab6 [2, 5] as N after Bb6 [2, 1] 

C6 [3, 5] as N after D6 [3, 1] 

Ab5 [4, 5] as N after Bb5 [4, 1] 

F3[12, 7]asNafterEb3[12, 1] 

F4 [8, 8] as N after Eb4 [8, 7] 

Segment 2 (verticals 9 to 16) 

Prolonged chord: whole-tone dominant (b5) on Bb [Bb D Fb Gb Ab / F] with pedal Eb 

Prolonging gamut: Eb harmonic minor scale [Eb F Gb Ab Bb D / Fb] 

Chord function within segment: V 

Connective gamut: chromatic [D Eb Fb Gb Bb] 

Common tones: 1 

Root movement: l-V in terms of prolonging gamut (Eb harmonic minor) 

Pitch-class content: 7-9A (t=2) [23456810] 

F4 [7, 4] as N before Gb4 [7, 5] 

Segment 3 (verticals 17 to 24) 

Prolonged chord: dominant seventh on E [E G# B D / F# G A Cb] with double pedal Eb/Bb 

Prolonging gamut: A melodic minor scale [A B D E F# G# / Bb Cb Eb G] 

Chord function within segment: V 

Connective gamut: octatonic collection 1 [D E G# Bb B] 

Common tones: 1 

Root movement: V-bll[Vdim5] in terms of previous prolonging gamut (Eb harmonic minor) 

Pitch-class content: 9-4B (t=2) [2 3467891011] 

Cb4 [10, 1] as enharmonic chord note 

Cb3 [11, 1] as enharmonic chord note 

A6 [1,5] as N after B6[1, 1] 

F#6 [2, 5] as N after G#6 [2, 1] 

A5 [3, 5] as N after B5 [3, 1] 

F#5 [4, 5] as N after G#5 [4, 1] 

F#4 [6, 5] and G4 [6, 8] as chained adjacencies after E4 [6, 4] 

Segment 4 (verticals 25 to 32) 

Prolonged chord: dominant minor ninth with suspended fourth on Bb [Bb Cb Eb F Ab / D E G A] 

Prolonging gamut: octatonic scale on Bb [Bb Cb D E F G Ab / Eb A] 

Connective gamut: octatonic collection 1 [D E F Ab Bb Cb] 

Trichord distance: 4 

Root movement: bll-V by association with prolonging gamut (Bb octatonic) 

Pitch-class content: 9-5B (t=2) [2 3457891011] 

Eb1 [16, 1] as tonic note under dominant harmony 

Eb2 [14, 1] as tonic note under dominant harmony 

G6[1, 1] as N before F6 [1,5] 

G5 [3, 1] as N before F5 [3, 5] 

A4 [6, 1] as chromatic N before Ab4 [6, 5] 

D4 [7, 3] and E4 [7, 4] as chained adjacencies before F4 [7, 5] 

G4 [8, 4] as P between F4 [8, 1] and Ab4 [8, 5] 

Summary of analysis (4 segments) 

Chord types prolonged 

dominant seventh: 25.0% (1) 

dominant minor ninth: 25.0% (1) 

whole-tone dominant (b5): 25.0% (1) 

minor-major seventh: 25.0% (1) 

[15 other common-practice chord types active but unused] 

[3 other standard chord types active but unused] 

[1 custom chord type active but unused] 

Prolonging gamuts 

Eb minor: harmonic 25.0% (1); melodic 25.0% (1) 

A minor: melodic 25.0% (1) 

octatonic 1: 25.0% (1) 

[3 other gamut types active but unused] 

Connective gamuts 

octatonic 1:66.7% (2) 

chromatic: 33.3% (1) 

[5 other selectable connective gamut types active but unused] 

Figure 7.11. Analysis by Pople's Tonalities software of Schoenberg Pelleas 
und Melisande, rehearsal no. 59, bars 1-4. 
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All of this sits uneasily with the perceived limitations of formally empirical en- 
quiry. As we have seen, impressive results have been gained by a variety of means in 
modeling some of the outputs of traditional analysis, but this just hasn't been good 
enough for the analysts themselves. In recent years, however, ways have been found 
to use formal methods to produce analyses that can, on occasion, offer the kinds of 
insight that analysts want, while also modeling traditional, contemplative analysis 
convincingly enough to buy into the trade in such insights that has up to now been 
dominated by written analysis. There is thus a genuine prospect that empirical mu- 
sicology, as outlined in this book, will be able in future to participate in the devel- 
opment of music analysis, though not — I think — to change its fundamental nature 
as an exercise of the musical imagination. 



Notes 



See this volume, p. 117, for discussion of von Hippel's and Huron's critique of the 
pitch reversal (gap fill) concept. 

While Anthony Pople's untimely death has halted development of the "Tonalities" 
project, the software can be obtained in fully functional form on the "Tonalities" web 
site (http://www.nottingham.ac.uk/music/tonalities), where further details can be 
found. [Eds.] 
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CHAPTER 



Analyzing Musical Sound 



Stephen McAdams, Philippe Depalle, and Eric Clarke 



Introduction 

Musicologists have several starting points for their work, oi which the two most 
prominent are text and sound documents (i.e., scores and recordings). One aim oi 
this chapter is to show that there are important properties of sound that cannot be 
gleaned directly from the score but that may be inferred if the reader can bring to 
bear knowledge of the acoustic properties of sounds on the one hand, and of the 
processes by which they are perceptually organized on the other. Another aim is to 
provide the musicologist interested in the analysis of sound documents (music re- 
corded from oral or improvising traditions, or electroacoustic works) with tools for 
the systematic analysis of unnotated — and in many cases, unnotatable — musics. 

In order to get a sense of what this approach can bring to the study of musical 
objects, let us consider a few examples. Imagine Ravel's Bolero. This piece is struc- 
turally rather simple, alternating between two themes in a repetitive AABB form. 
However, the melodies are played successively by different instruments at the be- 
ginning, and by increasing numbers of instruments playing in parallel on different 
pitches as the piece progresses, finishing with a dramatic, full orchestral version. 
There is also a progressive crescendo from beginning to end, giving the piece a 
single, unified trajectory. It is not evident from the score that, if played in a particu- 
lar way, the parallel instrumental melodies will fuse together into a single, new, com- 
posite timbre; and what might be called the "timbral trajectory" is also difficult to 
characterize from the score. What other representation might be useful in explain- 
ing, or simply describing, what happens perceptually? 

Figure 8.1 shows spectro graphic representations (also called spectrograms) of the 
first 1 1 notes of the A melody from Bolero, in three orchestrations from different sec- 
tions of the piece: (a) section 2, where it is played by one clarinet; (b) section 9, played 
in parallel intervals by a French horn, two piccolos, and celesta; and (c) section 14, 
played in parallel by most of the orchestra including the strings. We will come back to 
more detailed aspects of these representations later, but note that two kinds of struc- 
tures are immediately visible: a series of horizontal lines that represent the frequencies 
of the instruments playing the melody, and a series of vertical bars that represent the 
rhythmic accompaniment. Note too that the density, intensity (represented by the 
blackness of the lines), and spectral extent (expansion toward the higher frequencies) 
can be seen to increase from section 2 through section 9 to section 14, reflecting the 
increasing number, dynamic level, and registral spread of instruments involved. 
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Figure 8.1 a. Spectrogram of the first 1 1 notes of the A melody from Bolero by Ravel 
(section 2). In this example, horizontal lines below 1,000 Hz represent notes, horizontal 
lines above 1,000 Hz their harmonic components. Percussive sounds appear as vertical 
bars (0.4 seconds, 0.8 seconds, 1.3 seconds, etc.). b. Spectrogram of the first 11 notes of 
the A melody from Bolero by Ravel (section 9). Notice the presence of instruments with 
higher pitches (higher frequencies), c. Spectrogram of the first 1 1 notes of the A melody 
from Bolero by Ravel (section 14). Notice the increase of intensity represented by 
increased blackness. 



Now consider an example of electronic music produced with synthesizers: an 
excerpt from Die Roboten, by the electronic rock group Kraftwerk (Figure 8.2). First, 
note the relatively clean lines of the spectrographic representation, with little of the 
fuzziness found in the previous example. This is primarily due to the absence of the 
noise components and random fluctuations that are characteristic of natural sounds 
resulting from the complex onsets of notes, breath sounds, rattling snares, and the 
like. Several features of Figure 8.2 will be used in the following discussion, but it is 
interesting to note that most of the perceptual qualities of these sounds are not no- 
tatable in a score and can only be identified by concentrated listening or by visually 
examining acoustic analyses such as the spectrogram: for example, the opening 
sound, which at the beginning has many frequency components (horizontal lines 
extending from the bottom [low frequencies] to the top [high frequencies]), slowly 
dies down to a point (at about 1.4 seconds) where only the lower frequencies are 
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Figure 8.2. Spectrogram of an excerpt of Die Roboten by Kraftwerk. Notice the readabil- 
ity of the spectrographic representation that makes explicit continuous timbre, amplitude, 
and frequency variations. 



present. This progressive filtering of the sound has a clear perceptual result that is 
directly discernible from this representation. 

A spectrographic approach is also useful in the case of cultures in which music 
is transmitted by oral tradition rather than through notation. A telling example is the 
Inanga chuchote from Burundi, in which the singer whispers (chuchoter is French for 
"to whisper") and accompanies himself on a low-pitched lute (Figure 8.3). This mu- 
sical genre presents an interesting problem, in that the language of this people is 
tonal: the same syllable can have a different meaning with a rising or falling pitch 
contour. The fact that contour conveys meaning places a constraint on song pro- 
duction, since the melodic line must to some extent adhere to the pitch contour that 
corresponds to the intended meaning. But this is not possible with whispering, 
which has no specific pitch contour. The spectrogram reveals what is happening: the 
lute carries the melodic contour, reinforced by slight adjustments in the sound qual- 
ity of the whispering (it is brighter when the pitch is higher, and duller when it is 
lower). There is a kind of perceptual fusion of the two sources, due to their tempo- 
ral synchronization and spectral overlap, so that the pitch of the lute becomes "at- 
tached to" the voice. 

In light of these examples, we can see how an approach involving acoustical 
analysis and interpretation based on perceptual principles can be useful in analyz- 
ing recorded sound. The spectrogram is just one possible means of representing 
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Figure 8.3. Spectrogram of an excerpt of lnanga chuchote from Burundi. Notice the 
movements of shaded zones, within the range 500 to 3,000 Hz, that represent timbral 
variations of the whispered sounds. The onsets of the lute notes and plosive consonants 
produced by the voice are indicated by the vertical lines in the representation. 



sounds: others bring out different aspects of the sound, and it is characteristic of all 
such representations that different features can be brought out according to the set- 
tings that are used in creating them. (This is similar to the way in which the correct 
setting of a camera depends on what one wants to bring out in the photograph.) The 
goal of this chapter is therefore to introduce some of these ways of representing 
sound, and to provide a relatively nontechnical account of how such representations 
work and how they are to be interpreted. The chapter is organized in three main sec- 
tions, the first dealing with basic characteristics and representations of sound, the 
second with acoustical analysis, and the third with perceptual analysis; the two an- 
alytical sections conclude with brief case studies taken from the literature. 



Basic Characteristics and Representations of Sound 



Sound is a wave that propagates between a source and a receiver through a medium. 
(The source can be an instrument, a whole orchestra or loudspeakers; the receiver 
can be the ears of a listener or a microphone; the medium is usually air.) It can also 
be considered as a signal that conveys information from an instrument or loud- 
speaker to the ears of a listener, who decodes the information by hearing the time 
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evolution of the acoustic wave, and recognizes instruments, notes played, a piece of 
music, a specific performer, a conductor, and so on. Using machines to analyze sound 
signals involves structuring the information in a way that is similar to what the ear 
does; such analyses usually provide symbolic information or — as in this chapter — 
graphical representations. 

The analysis of a sound, then, starts with a microphone that captures variations 
in air pressure (produced by a flute, for example) and transduces them into an elec- 
trical signal. This signal can be represented as a mathematical function of time, and 
is therefore called a temporal representation of the sound. A graphical display of such 
a temporal representation is an intuitive way to begin to analyze it, and in the case 
of a solo flute note we might get the temporal representation in Figure 8.4, with time 
on the horizontal axis and the amplitude of the signal on the vertical axis. The fig- 
ure reveals the way the sound starts (the attack), the sustained part with a slight os- 
cillation of the level, and the release of the flute sound at the end. However, it fails 
to help us in determining the nature of the instrument and the note played, and in 
the case of more complex sounds — say an excerpt of an orchestral composition — 
very little is likely to emerge beyond a rough impression of dynamic level. That 
means that we need to find alternative representations based on mathematical trans- 
formations of the simple temporal representation. The most important of these 
transformations involve the idea of periodicity, since this is intimately linked to the 
perception of pitch — a primary feature of most musical sounds. 
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Figure 8.4. Temporal representation of a simple flute note. 
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Figure 8.5 is a simple temporal representation, like Figure 8.4, but the tempo- 
ral profile is shown at a much higher level of magnification. We can now begin to 
see the repeated patterns that define periodic sounds: the term period refers to the 
duration of the cycle, and the number of times the cycle repeats itself per second is 
called the frequency (or fundamental frequency). Thus, the frequency is the recipro- 
cal of the period, and its unit is the Hertz (Hz). It can be seen from Figure 5 that six 
periods are a little shorter than seven divisions of the time axis (which are hun- 
dredths of a second), so that the fundamental frequency is 87.2 Hz — which is the F 
at the bottom of the bass clef. 

While frequency determines the pitch of the clarinet sound in Figure 8.5, the 
particular shape of the wave is related to factors that determine its timbral proper- 
ties. How might it be possible to classify or model the range of different shapes that 
sound waves can take? A spectral representation (or spectrum) attempts to model 
sounds through the superimposition of any number of waves of different frequen- 
cies, with each individual wave taking the form of a "sinusoid": this is a function that 
endlessly oscillates at a given frequency, and which can be approximated by the sus- 
tained part of the sound of a struck tuning-fork. Figure 8.6a shows a few sinusoidal 
oscillations at a frequency of 440 Hz (the standard tuning fork A). Now if, instead 
of showing amplitude against time as in Figure 8.6a, we were to show it against fre- 
quency, we would see a single vertical line corresponding to 440 Hz: this is shown in 
Figure 8.6b, a much more condensed and exhaustive representation of the signal by 
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Figure 8.5. Simple periodic sound: six periods of a bass clarinet sound. 
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Figure 8.6 a. Sinusoidal sound (frequency = 440 Hz, amplitude = 1.0): temporal 
representation with a single period indicated, b. Sinusoidal sound (frequency = 440 Hz, 
amplitude = 1.0): spectral representation. 
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its spectral content, than by its temporal representation. The same representation can 
obviously show any number of different spectral components — different sinusoids — 
at different levels of amplitude: Figure 8.7a shows a temporal representation of a mix 
of three sinusoids at different amplitudes, and Figure 8.7b the resulting spectrum. 
While the number, frequency, or amplitude of the individual components are diffi- 
cult to estimate from Figure 8.7a, all are immediately apparent from Figure 8.7b. 

As there are only three sinusoids in Figures 8.7a and 8.7b, all in the central au- 
ditory range and at relatively similar dynamic levels, each will be heard as a separate 
pitch: in fact, since their frequencies are 440, 550, and 660 Hz, the percept will be 
an A major chord. A different set of sinusoids, by contrast, might produce the effect 
of a single pitch with a distinctive timbre. Because it represents only the acoustical 
qualities of the signal, not its perceptual correlates, the difference cannot be directly 
seen in a spectral representation. 

The principle of decomposing a complex waveform into separate elements can 
be taken a good deal further than this. The mathematician Joseph Fourier demon- 
strated that a periodic signal, whatever the shape of its waveform, can always be an- 
alyzed into a set of harmonically related sinusoids ("harmonically," meaning that the 
frequencies of these sinusoids are multiples of the fundamental frequency — as, for 
instance, 440, 880, 1320 Hz, and so on are integer multiples of 440 Hz). The col- 
lection of these harmonics constitutes the Fourier series of the signal, and is an im- 
portant property since it roughly corresponds to the way sounds are analyzed by the 
auditory system. In practice, then, analyzing a periodic or harmonic signal consists 
of determining the fundamental frequency and the amplitude of each harmonic 
component. Figure 8.8 compares waveforms and Fourier analyses of two simple sig- 
nals often used in commercial synthesizers. (Note that in the Fourier representations 
in Figures 8.8a2 and 8.8b2 the frequency axis shows values as multiples of 10 4 Hz: 
thus "1" represents 10,000 Hz or 10 kHz.) The equidistant vertical lines represent 
the different harmonics of the fundamental frequency, which is again 440 Hz. Com- 
parison of the sawtooth waveform (Figure 8.8a2) with a square wave (Figure 8.8b2) 
shows that the latter lacks even-numbered harmonics. Figure 8.8c, by contrast, 
shows the spectrum of the clarinet sound from Figure 8.5, which — while it exhibits 
vertical harmonic peaks — does not look as clean as the synthesized examples. 

In addition to harmonic signals, there are inharmonic signals such as the sounds 
of bells (Figure 8.9a) or tympani (Figure 8.9b): these are not periodic, but can still 
be described as a series of superimposed sinusoids — although the sinusoids are no 
longer harmonically related. (They are therefore called partials rather than harmon- 
ics, and can take any frequency value.) Inharmonic sounds do not have a precise 
overall pitch, though they may have a pitch that corresponds to the dominant par- 
tial, or even several pitches. The bell sound in Figure 8.9a includes a series of near- 
harmonic components (a "fundamental frequency" at 103 Hz (Gt), a second har- 
monic at 206 Hz, a ninth one at 927 Hz, a thirteenth one at 1,339 Hz, and so on) 
but also other inharmonic components; these components give the sound a chord- 
like quality. The tympani spectrum in Figure 8.9b is similarly inharmonic, with some 
partials conforming to a nearly harmonic relationship with a fundamental frequency 
at 66 Hz (C). Indeed there are many sounds that we think of as clearly pitched but 
which are slightly inharmonic: Figure 8.9c shows the spectrum of a piano note 
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8.7 a. Mix of three sinusoids (frequencies are 440, 550, and 660 Hz; amplitudes 
0.5, and 0.25, respectively): a. Temporal representation, b. Spectral representation. 
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whose partial components are near-harmonics of the fundamental frequency 831 Hz 
(gf"), and one can see that the inharmonicity — represented by the difference be- 
tween the positions of the actual partials and the theoretical positions shown by 
dashed lines — increases with frequency. (The cluster of low-frequency components, 
below the fundamental up to around 2,500 Hz, is produced not by the vibration of 
the piano strings, but by the soundboard.) Here the deviation from harmonicity at 
frequencies below about 5 kHz is sufficiently small that the listener perceives a pre- 
cise pitch. 

For the sound characterization to be complete, the category of noisy sounds has 
to be considered. By definition, these have a random temporal representation (such 
as the whispered sound between 2.2 and 3.2 seconds in Figure 8.3). Few instru- 
mental sounds are completely noisy, but most of them include a certain amount of 
noise (the player's breath in the case of wind instruments, impact noise for percus- 
sion, and so on), and as we shall later see, this noisy part is nearly always very im- 
portant for the perception of timbre. The sounds of wind and surf, replicated by 
electronic noise generators, are by contrast completely noisy. Such sounds are com- 
posed of a random mix of all possible frequencies, and their spectral representation 
is the statistical average of the spectral components. For example, white noise has a 
random waveform (Figure 8.10a). Although it has a spectrum that is theoretically 
flat, with all frequencies appearing at the same level, in practice the spectrum re- 
vealed by Fourier analysis is far from being perfectly flat (Figure 8. 10b), and exhibits 
variations that are due to the lack of averaging of the random fluctuations in level of 
the different frequencies. These fluctuations can be reduced by taking the average of 
several spectra computed on successive time-limited samples of the noisy sound. 

Now that the basic characteristics of sounds have been described, it is impor- 
tant to mention one major aspect of "natural" sound signals: their characteristics 
(frequency, amplitude, waveform, inharmonicity, or noise content) always vary over 
time. Sounds with perfectly stable characteristics (such as the sinusoids in Figure 
8.6, or the stable low-frequency sound in Die Roboten, between 2 and 3 seconds in 
Figure 8.2) sound "unnatural" or "synthetic." These time-varying characteristics are 
called modulations and can take various forms. Amplitude modulations range from 
uncontrolled random fluctuations, such as in the flute note in Figure 8.4, to the 
tremolo on the low sustained note in Die Roboten (between 1.5 and 2 seconds in Fig- 
ure 8.2); frequency modulations can include vibrato (the undulating horizontal lines 
produced by the piccolo during the first 1.5 seconds of Figure 8.1b) or pitch glides 
(the upward sweeping "chirp" sound in Die Roboten between 3.8 and 4.5 seconds, 
Figure 8.2). Apart from their impact on the character of individual sounds, the pres- 
ence and synchronization of modulations are very important for the perception of 
fused sounds, and will be discussed in the final section of this chapter. 

This means that there is a significant element of approximation or idealization 
in all but the simplest representations of sound which we have been discussing. 
Most obviously, spectral representations (Fourier or otherwise) relate amplitude and 
frequency: they represent values averaged over a discrete temporal "window" or 
"frame," and therefore tell us nothing about changes in the sound during that period 
of time. (One could represent the changes by showing a series of spectral represen- 
tations one after another, in the manner of an animation, but even then each frame 
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Figure 8.8 al. Sawtooth waveform. a2. Spectrum (Fourier 
series) of a sawtooth waveform, bl. (next page) Square 
waveform. b2. Spectrum (Fourier series) of a square wave- 
form, c. Spectrum (Fourier series) of a clarinet waveform. 



would represent the average over a given time span.) There is also a similar point in 
relation to periodicity. While from a mathematical point of view a periodic signal re- 
produces exactly the same cycle indefinitely, in the real world any sound has a be- 
ginning and an end: for this reason alone, musical sounds are not mathematically 
periodic. Moreover, they nearly always show slight differences from one period to 
the next (as can be seen from the waveform of the clarinet sound in Figure 8.5). In 
practice, a sound is perceived as having a definite pitch as soon as there is a suffi- 
cient degree of periodicity, not when it is mathematically periodic, and we therefore 
need an analytical tool that will identify degrees of periodicity on a local basis, show- 
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Figure 8.9 Inharmonicity: a. Spectrum of a bell sound. The series of vertical thin lines 
represent theoretical locations of a harmonic series of fundamental frequency 103 Hz. 
b. Spectrum of a tympani sound. The series of vertical thin lines represent theoretical 
locations of a harmonic series of fundamental frequency 66 Hz. c. Spectrum of a piano 
sound. The series of vertical dashed lines represents theoretical locations of a harmonic 
series of a fundamental frequency of 831 Hz. 
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Figure 8.10 Examples of a noisy sound (white noise) a. Temporal representation, 
b. Spectral representation. Notice that the spectrum of white noise is random and more 

or less flat on average. 
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ing how the spectrum changes over time. This is exactly what the spectrogram does, 
as illustrated in the introduction of this chapter — and this, again, is something to 
which we will return. 



Acoustical Analysis of Sounds 

A waveform display allows a user to analyze a sound quite intuitively by simply 
looking at its temporal representation. The representation can be created at different 
time scales, from the "microscopic" or short-term scale, to the "macroscopic" or 
long-term scale. A "microscopic" time scale, which in practice is usually on the order 
of a few periods, preserves the waveform shape, and allows a qualitative evaluation 
of the presence or absence of noise and its level, as well as the presence of strong, 
high-order harmonic components. Figures 8.5a, 8.6a, 8.7a, 8.8al, and 8.8bl are 
temporal representations of signals at a "microscopic" time scale: this also allows one 
to evaluate precisely the synchronization between acoustic events. By contrast, a 
"macroscopic" time scale makes visible long-term tendencies such as the global evo- 
lution of the sound level, amplitude changes in the course of a melodic line, or the 
way notes follow one another; an example is Figure 8.4 in which the temporal en- 
velope of a flute sound can be discerned. 

Sound signals are usually displayed on computer screens using sound editor 
programs; some examples are ProTools, AudioSculpt, Peak, SpectroGramViewer, and 
Audacity. However there is a problem when the number of samples to be represented 
on the screen becomes larger than the number of available pixels: as sounds are usu- 
ally sampled at 44,100 Hz (the compact disc standard), and as the highest number 
of pixels on each line of current screens is usually less than 2000, a complete display 
is possible only for durations shorter than 50 milliseconds (ms). Beyond that, sev- 
eral sample values have to be averaged into one pixel value (in other words, the sig- 
nal has to be smoothed), and this prevents precise investigation of long-term sound 
characteristics, limiting the direct use of "macroscopic" sound signal displays to the 
analysis of global temporal evolutions. Even in the case of these global evolutions, 
though, there is a perceptually more meaningful means of analysis: temporal envelope 
estimation. 

As an example, Figure 8.11 displays the dynamic evolution of an excerpt from 
Ligeti's orchestral piece Atmospheres, starting with a slow crescendo /decrescendo over 
the first 20 seconds, followed by a fast crescendo to a median level, and a slow cres- 
cendo for the next 30 seconds, and so on. Such a temporal envelope, which is cal- 
culated automatically by most sound editing programs, is again based on a series of 
temporal windows or frames, the duration of which is normally between 10 and 200 
ms, with the window sliding along the time axis to provide an estimate of the tem- 
poral envelope at set intervals. The only setting that you need to adjust in order to 
get a temporal envelope is the size of the window. The chosen size will inevitably be 
a compromise: it has to be large enough to smooth over the fine-grained oscillations 
of the signal, but at the same time small enough to preserve the shape of the attack, 
or of the other transient parts of the sound. The problem can be seen by comparing 
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Figure 8.11. Temporal envelope estimation for an excerpt from Ligeti's Atmospheres. 



three different temporal envelope estimations for the temporal representation shown 
in Figure 8.12a, the spectrogram of which was shown in Figure 8.2. In Figure 8.12b, 
too small a window (5.8 ms) has been chosen, and the profile exhibits spurious 
oscillations that are not perceived as changes in level; Figure 8.12c represents 
an appropriate value (23.2 ms), whereas in Figure 8.12d the window is too large 
(92.8 ms) and fast changes of level (particularly between 1.5 and 2 seconds) are 
smoothed out. The rule of thumb is to use more than the duration of the largest pe- 
riod contained in the original sound; in this way, musicologists who want to use a 
temporal envelope analysis need to set the window size in accordance with the par- 
ticular music they are studying, as well as the particular aspects of the sound in 
which they are interested. 

For many musicological purposes a spectral representation will be the best 
choice for sound signal analysis. There are, however, some practical problems asso- 
ciated with it: one is how to estimate the spectrum of nonperiodic sounds, while an- 
other is the relationship between the sampling rate, the sampling window, and the 
frequency of the sounds being studied. To be completely known, a nonperiodic sig- 
nal has to be observed over its entire duration, unlike a periodic signal (where ob- 
servation over one period is sufficient). This means that a Fourier series representa- 
tion of a nonperiodic sound is not possible: the appropriate representation is instead 
a Fourier transform. A simple way to understand the extension of the Fourier series 
to the Fourier transform is to think of a nonperiodic signal as a periodic signal whose 
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(a) 





(b) 

Figure 8.12 a. Temporal representation of the first 
three seconds of Die Roboten by Kraftwerk. b. Temporal 
envelope estimation of the sound signal displayed in 
(a) using a window size of 5.8 ms. 



period is infinite. An example of a nonperiodic signal and its Fourier transform is 
given in Figure 8.13. The sound signal is a damped sinusoid, that is a sinusoid 
whose amplitude decreases over time. The treatment of this kind of signal is signif- 
icant, since many percussion instruments (including bells, timpani, pianos, and xy- 
lophones) produce a superposition of damped sinusoids. 

The Fourier transform (Figure 8.13b) exhibits a maximum close to the oscillat- 
ing frequency, but there is some power at other frequencies, particularly those close 
to the central frequency. The sharpness of the maximum, sometimes called afor- 
mant, varies in inverse proportion to the degree of damping. The damping makes the 
oscillating sinusoid nonperiodic and spreads its power out to other frequencies; the 
spectrum is therefore no longer a peak (as it is for a pure sinusoid) but a smoothed 
curve, whose bandwidth (the width of the curve) increases with the damping value. 
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(d) 

Figure 8.12 (cont'd.) Temporal envelope estimation of 
the sound signal displayed in (a) using: c. A window size 
of 23.2 ms. d. A window size of 92.8 ms. 



As in the case of temporal envelope estimation, choosing the right duration as 
the basis for calculating the Fourier transform is a compromise: the duration needs 
to be small enough to maintain sufficient resolution between closely adjacent sinu- 
soids, and yet not so large as to average out all of the temporal evolution of the 
sound's spectral characteristics. A good compromise is usually a duration of four to 
five times the period of the lowest frequency difference between the sinusoidal com- 
ponents of the sound. Figure 8.14 revisits the spectral analysis of the A major chord, 
made up of three sinusoidal sounds, that was shown in Figure 8.7: since the fre- 
quencies are 440, 550, and 660 Hz, the lowest frequency difference is 110 Hz, 
which (at a 44,100 Hz sampling rate) corresponds to approximately 400 samples. 
Figure 8.14 shows that a window size of 2,048 samples i.e., approximately 5 times 
400 samples) successfully separates out the three components. 
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Figure 8.13 Nonperiodic signal a. A damped sinusoid (frequency = 440 Hz), b. Spec- 
trum of a damped sinusoid (frequency = 440 Hz). Notice the continuous aspect of the 
spectrum. 
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i.14. Time-limited Fourier analysis: spectrum of the major chord from 
■ .7 estimated with a window size of 2,048 samples. 



We are now ready to come full circle, establishing the link between all the con- 
cepts developed in this section and the spectrographic representation at the very be- 
ginning of the chapter. A spectrum, resulting from a Fourier transform performed 
over a finite duration, provides useful information about a sound signal only when 
the sound is known to be stable in time. However, as already mentioned, the char- 
acteristics of natural sounds always vary in time. A spectrum taken from a window 
located at the beginning of a sound (Figure 8.15a) is usually different from a spec- 
trum taken from a window located in the middle of the sound (Figure 8.15b). In 
order to describe the temporal variations of the spectral properties of a sound, a 
simple idea is to compute a series of evenly spaced local spectra. This is achieved by 
computing a Fourier transform for each of a series of sliding windows taken from 
the signal. The time-step increment of the sliding window is usually a proportion of 
the window size, and a time-shift of an eighth of the window size or less ensures per- 
fect tracking of the temporal evolution. Each Fourier transform represents an esti- 
mate of the spectral content of the signal at the time on which the window is cen- 
tered, and a simple and efficient way to display this series of spectra is to create a 
time/frequency representation, with the darkness of the trace representing the am- 
plitude of each frequency (Figure 8.16). This representation, which we have already 
encountered, is called a spectrogram (or sometimes a sonogram). 

In summary, we have seen two different kinds of two-dimensional representa- 
tion of sound, and a three dimensional representation. The two-dimensional repre- 
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Figure 8.15 a. Spectrum of a bass clarinet tone during the attack (window centered on 
0.12 second, window size of 4,096 samples), b. Spectrum of a bass clarinet tone during 
the sustained part (window centered on 1 second, window size of 4,096 samples). 
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Figure 8.16. Spectrogram of a bass clarinet sound (window size of 4,096 samples, 
time-step increment of 512 samples). 



sentations are temporal representations (amplitude against time), and spectral rep- 
resentations (amplitude against frequency); the three-dimensional representation — 
the spectrogram — shows frequency against time on the two axes, and intensity 
against time in the blackness of the trace. 



Analytical Applications 

The most sustained example of applying acoustical principles to musical analysis, 
and one which makes considerable use of spectrograms, is Robert Cogan's book New 
Images of Musical Sound (Cogan 1984). The central part of the book consists of a dis- 
cussion and analysis of 17 "spectrum photos" (the equivalent of spectrograms) of 
music from a wide variety of traditions, including Gregorian chant, jazz, a move- 
ment of a Beethoven piano sonata, electroacoustic music, and Tibetan tantric chant; 
the examples range from half a minute to over 1 1 minutes in duration, with the 
majority around two to four minutes long. The spectrum photos show duration on 
the horizontal axis and frequency on the vertical axis, with intensity represented as 
the brightness of the trace (while intensity is represented by blackness against a 
white background in the spectrograms presented in this chapter, it is shown as 
whiteness against a black background in Cogan's photos). The spectrum photos 
were created using analog signal analysis equipment, with a camera used to take 
photographs of the cathode ray tube for successive sections of music; these were 
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then literally pasted together to create the resulting composite photos that appear in 
the book. Digital technology has made it possible to produce more flexible and finely 
graded representations of this kind far more easily, as the figures in this chapter dem- 
onstrate. 

Cogan uses the spectrum photos to analyze and demonstrate a variety of differ- 
ent features of the music, the diversity of which is intended to show how many dif- 
ferent features can be addressed through these means. His discussion of Billie Holi- 
day's recording of "Strange Fruit," for example, focuses on the ways in which Holiday 
uses continuous pitch changes and the timbral effects of different vowel sounds to 
articulate semantic relationships in the text, and to expose its savage ironies: "Note 
bending is a motif that recurs with ever-increasing intensity. 'The gallant South' is 
immediately echoed with growing irony at 'sweet and fresh,' again bending to the 
voice's lowest depth. Then a string of increasingly bent phrases. . . . leads with gath- 
ering intensity to the explicit recall of the first stanza ..." (Cogan 1984: 35) 

By comparison, the discussion of Elliott Carter's Etude III for Wind Quartet fo- 
cuses on the timbral changes that result both from instrumental entries and exits, 
and from continuous dynamic changes, rhythmic augmentation, and diminution. 
This analysis is unusual in using spectral representations (rather than temporal rep- 
resentations) to give more detail about the relative balance of different spectral com- 
ponents in the sound than is possible in the spectrograms used elsewhere in the 
book: because time is eliminated from the representation, Cogan presents a se- 
quence of 18 spectral "snapshots" to demonstrate how the timbre evolves over the 
piece. Without getting involved in the detail of Cogan's analysis, a sense of what he 
claims such an analysis can achieve can be gathered from the following (Cogan 
1984: 71-72): 

Spectrum analysis provides a tool whereby the important similarity of these 
passages — the initial one characterized by instrumental change and rhythmic 
diminution, the climactic one by dynamic change and rhythmic augmenta- 
tion — can be discovered and shown. Remove the spectral features and the 
most critical formal links of the entire etude . . . disappear. Without spectral 
understanding, the link between the successive transformations — instrumen- 
tal, rhythmic, and dynamic . . . would evaporate. . . . We noted at the begin- 
ning of this commentary that, in the light of earlier analytic methods, this 
etude could emerge only as incomprehensible, static, or both. It now, how- 
ever, reveals itself to be a set of succinct, precise spectral formations whose 
roles and relationships, whether of identity or opposition, are clear at every 
instant. 

Two further examples of the way in which spectral analysis can be used for mu- 
sicological purposes are provided by Peter Johnson's (1999) discussion of two per- 
formances of the aria "Erbarme Dich" from Bach's St. Matthew Passion, and David 
Brackett's (2000) analysis of a track by Elvis Costello. The subject of Johnson's paper 
is a wide-ranging discussion of the relationship between performance and listening, 
with Bach's aria as a focal example viewed from aesthetic and more concretely ana- 
lytical perspectives. Central to Johnson's argument is his insistence that the impact 
of the sound of a performance on listeners' experience is consistently underestimated 
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by commentators and analysts, and an important part of the paper is thus devoted 
to a detailed consideration of the acoustical characteristics of two performances of 
the aria. Johnson focuses primarily on differences in the frequency domain, high- 
lighting distinctions in the use of vibrato, timbre, and intonation in the first eight 
bars of the aria as taken from recordings directed by John Eliot Gardiner and Karl 
Richter. On the basis of both spectral and temporal representations of the sound 
characteristics of the first few bars of the instrumental opening, obtained using the 
signal processing and plotting software in the Matlab program, Johnson demon- 
strates how Gardiner's recording features a more transparent timbre, much less vi- 
brato in the solo violin part, and a more fluctuating amplitude profile, with a con- 
sistent tendency for the amplitude to drop away at group and phrase endings; 
Richter's recording, by contrast, demonstrates a more constant vibrato and ampli- 
tude level, a thicker timbre, and the use of expressive intonation (a flattening of the 
mediant note). 

Johnson acknowledges that 

much of what is shown by spectrographic analysis is little more than a visual 
analogue of what we have already recognized and perceived through listening. 
Nonetheless, acoustic analysis reinforces the experiential claims of the listen- 
ing musician, namely that (1) performance can significantly determine the 
properties of the experience itself, and (2) the listening experience is not 
wholly private: hearing is not entirely "subjective" in the sense of a strictly un- 
verifiable or purely solipsistic mode of perception. . . . Finally, acoustic analy- 
sis is a powerful medium for the education of the ear and as a diagnostic tool 
for the conscientious performer, the didactic possibilities of which have barely 
begun to be exploited. (Johnson 1999: 83-84) 

In fact Johnson uses the acoustic characteristics of the two recordings to argue that 
the two interpretations offer distinctly different musical and theological perspec- 
tives. Richter's recording, he claims, conveys a sense of reverence and authority (in 
relation both to Bach and the biblical narrative) in its even lines, thick textures, con- 
stant amplitude, tempo and vibrato, and solemnly "depressed" expressive intona- 
tion. Gardiner's, by contrast, is more enigmatic, using a faster and more flexible 
tempo and a more transparent sound to conjoin the secular connotations of dance 
with the seriousness of the biblical text; Johnson describes it as a "rediscovery in 
later 20th century Bach performance practice of the physical, the kinesthetic, not 
(here) as licentiousness but as a medium through which even a Passion can find new 
(or old) meanings." (Johnson 1999: 99) 

Brackett's (2000) use of spectrum photos is more cursory and restricted, but 
worth considering because of the comparative rarity of this approach in the study of 
popular music — perhaps surprisingly, given that it is a non-score-based tradition in 
which acoustic characteristics (such as timbre, texture and space) are acknowledged 
to be of particular importance. Brackett's aim in his chapter on Elvis Costello's song 
"Pills and Soap" is to demonstrate various ways in which Costello maintains an elu- 
sive relationship with different musical traditions — particularly in his negotiations 
with art music. Brackett uses spectrum photos similar to Cogan's to make points 
about both the overall timbral shape of "Pills and Soap," and more detailed aspects 
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of word setting. An example of the latter is Brackett's demonstration (p. 187) that suc- 
cessive repetitions of the word "needle" in the song become increasingly timbrally 
bright and accented, as a way of drawing attention to the word and its narrative/ 
semantic function. At a "middleground" level, he points out that vocal timbre (as 
well as pitch height) is used to give a sense of teleology to each verse, pushing the 
song forward. Finally (and this is where the connection with art music becomes 
more explicit), Brackett uses spectral information to support his claim that the song 
represents a particular kind of skirmish with Western art music. He shows how an 
increasingly oppositional relationship between high- and low-frequency timbral 
components characterizes the large-scale shape of the song, and argues that this 

is much more typical of pieces of Western art music than it is of almost any 
other form of music in the world, be it popular, "traditional," or non-Western 
"art" music. Examination of the photos in Robert Cogan's New Images of Musi- 
cal Sound reveals a greater similarity between the spectrum photo of "Pills 
and Soap" and the photos of a Gregorian chant, a Beethoven piano sonata, 
the "Confutatis" from Mozart's Requiem, Debussy's "Nuages," and Varese's 
Hyperprism, than between "Pills and Soap" and the Tibetan Tantric chant or 
Balinese shadow-play music. For that matter, the photo of "Pills and Soap" 
more closely resembles these pieces of art music than it does the photo for 
"Hey Good Lookin," the photo of which may reveal timbral contrast on a 
local level without that contrast contributing to a larger sense of teleological 
form. (Brackett 2000: 195) 

Whether the argument that Brackett advances here stands up to scrutiny or not 
(there might be all kinds of reasons why "Pills and Soap" doesn't have a spectral 
shape that looks anything like Tibetan chant, Balinese shadow-play music, or an- 
other arbitrarily chosen popular song), the point that it makes is that the empirical 
evidence provided by spectral and temporal representations can furnish an impor- 
tant tool in a musicological enterprise — and that is what this chapter is intended to 
demonstrate. 

The examples presented here, however, also illustrate some of the problems and 
pitfalls of using such information; it is very hard to find representational methods 
and analytical approaches that successfully reconcile detailed investigation with 
some sense of overall shape. Johnson's analysis, in focusing on the details of vibrato, 
timing, and intonation, doesn't go beyond bar 8 of the Bach aria; by contrast, Brack- 
ett's analysis of Costello, and many of Cogan's analyses, present spectrum photos at 
such a global level and with such inadequate resolution that some of the features and 
distinctions they discuss are all but invisible — and have to be taken on trust to more 
or less the same degree as if the authors were simply to tell the reader that the timbre 
gets brighter, or that there's a tiny articulation between phrases, or that there is an 
increasing accent on a word. In other words, there is a question about whether all 
the visual apparatus can really convince a reader of very much at all. 

In part this is a purely technological matter, and the technology has certainly 
improved dramatically since the time of Cogan's book. But as shown by the much 
more sophisticated representations that Johnson uses, and as argued in this chapter, 
the problem is by no means solved by technological progress: there is still a real 
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problem in extracting the salient features from a data representation that contains a 
potentially overwhelming amount of information, only a tiny fraction of which may 
be relevant at any moment. The problem is testimony to the extraordinary analyti- 
cal powers of the human auditory system: in the mass of detail that is presented in 
a "close-up" view of the sound, the auditory system finds structure and distinctive- 
ness. Some of the principles that account for this human capacity, and the ways in 
which they may contribute to musicological considerations, are the subject of the 
final section of this chapter. 



Perceptual Analysis of Sounds 

Music presents a challenge to the human auditory system, because it often contains 
several sources of sound (instruments, voices, electronics) whose behavior is coor- 
dinated in time. In order to make sense of this kind of musical material, the charac- 
teristics of the individual sounds, of concurrent combinations of them, and of se- 
quences of them, must be identified by the auditory system. But to do this, the brain 
has to "decide" which bits of sound belong together, and which bits do not. As we 
will see, the grouping of sounds into perceptual units (events, streams, and textures) 
determines the perceived properties or attributes of these units. Thus, in consider- 
ing the perceptual impact of the sounds represented in a score or a spectrogram, it 
is necessary to keep in mind a certain number of basic principles of perceptual pro- 
cessing. 

Music played by several instruments presents a complex sound field to the 
human auditory system. The vibrations created by each instrument are propagated 
through the air to the listeners' ears, and combine with those of the other instru- 
ments as well as with the echoes and reverberations that result from reflections off 
walls, ceiling, furniture, and so on. What arrives at the ears is a very complex wave- 
form indeed. To make matters worse, this composite signal is initially analyzed as a 
whole. The vibrations transmitted through the ear canal to the eardrum and then 
through the ossicles of the middle ear are finally processed biomechanically in the 
inner ear (the cochlea), such that different frequency regions of the incoming signal 
stimulate different sets of auditory nerve fibers. This is the aural equivalent of the 
spectral analysis described in the first main section of this chapter; one might con- 
sider the activity in the auditory nerve fibers over time as a kind of neural spectro- 
gram. So if several instruments have closely related frequencies of vibration in their 
waveforms, they will collectively stimulate the same fibers: that is, they will be mixed 
together in the sequence of neural spikes that travel along that fiber to the brain. As 
we shall see, this would be the case for the different instruments playing the Bolero 
melody in parallel in a close approximation to a harmonic series. 

It should be noted, however, that the different frequencies are still represented 
in the time intervals between successive nerve spikes, since the time structure of the 
spike train is closely related to the acoustic waveform. Furthermore, a sound from a 
single musical instrument is composed of several different frequencies (see the bass 
clarinet example in Figure 8.16) and thus stimulates many different sets of fibers; 
that is, it is analyzed into separate components distributed across the array of audi- 
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tory nerve fibers. The problem that this presents to the brain is to aggregate the sep- 
arate bits that come from the same source, and to segregate the information that 
comes from distinct sources. Furthermore, the sequence of events coming from the 
same sound source must be linked together over time, in order to follow a melody 
played by a given instrument. Let us consider a few examples of the kinds of prob- 
lem this poses. 

In some polyphonic music (such as Bach's orchestral suites or Ligeti's Wind 
Quintet), the intention of the composer is to create counterpoint, the success of 
which clearly depends on achieving segregation of the different instruments (Wright 
and Bregman 1987): what must be done to ensure that the instruments do not fuse 
together? In other polyphonic music, however (Ravel's Bolero, Ligeti's Atmospheres), 
the composer may seek a blending of different instruments and this would depend 
on achieving fusion or textural integration of the instruments: what must the musi- 
cians do to maximize the fusion and how can this be evaluated objectively? Finally, 
in some instrumental music an impression of two or more "voices" can be created 
from a monophonic source (such as in Telemann's recorder music or Bach's cello 
suites), or a single melodic line may be composed across several timbrally distinct 
instruments (as in Webern's Six Pieces for Large Orchestra, op. 6): what determines 
melodic continuity over time, and how might the integration or fragmentation be 
predicted from the score or for a given performance? For all these questions, the 
most important issue is how the perceptual result can be characterized from repre- 
sentations of the music (scores for notated music, acoustic representations for re- 
corded or synthesized music). Obviously one can simply listen and use an aurally 
based analytical approach, but this restricts the account to the analyst's own (per- 
haps idiosyncratic) perceptions; if the aim is to provide a more generalized inter- 
pretation, the solution is to use basic principles of auditory perception as tools for 
understanding the musical process. 

Grouping processes determine the perception of unified musical events (notes or 
aggregates of notes forming a vertical sonority), of coherent streams of events (hav- 
ing the connectedness necessary to perceive melody and rhythm), and of more or 
less dense regions of events that give rise to a homogeneous texture. Perceptual fu- 
sion is a grouping of concurrent acoustic components into a single auditory event (a 
perceptual unit having a beginning and an end); the perception of musical attributes 
such as pitch, timbre, loudness, and spatial position depends on which acoustic 
components are grouped together. Auditory stream integration is a grouping of se- 
quences of events into a coherent, connected form, and determines what is heard as 
melody and rhythm. Texture is a more difficult notion to define, and has been the 
object of very little perceptual research, but intuitively the perception of a homoge- 
neous musical texture requires a grouping of many events across pitch, timbre, and 
time into a kind of unitary structure, the textural quality of which depends on the 
relations among the events that are grouped together (certain works by Ligeti, 
Xenakis, and Penderecki come to mind, as do any number of electroacoustic works). 
Note that the main notion behind the word "grouping" is a kind of perceptual con- 
nectedness or association, called "binding" by neuroscientists. It seems clear that 
many levels of grouping can operate simultaneously, and that what is perceived de- 
pends to some extent on the kind of structure upon which a listener focuses. Since 
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a large amount of scientific research has been conducted on concurrent and se- 
quential sound organization processes, we will consider these in more detail, before 
moving on to discuss the perception of the musical properties (spatial location, 
loudness, pitch, timbre) that emerge from the auditory images formed by the pri- 
mary grouping process. 

There are two main factors that determine the perceptual fusion of acoustic 
components into unified auditory events, or their segregation into separate events: 
onset synchrony and harmonicity. A number of other factors were originally thought 
by perception researchers to be involved in grouping, but are probably more impli- 
cated either in increasing the perceptual salience of an event (vibrato and tremolo), 
or in allowing a listener to focus on a given sound source in the presence of several 
others (spatial position; for reviews see McAdams 1984, Bregman 1990, 1993, Dar- 
win and Carlyon 1995, Deutsch 1999). We will focus here on the grouping factors. 

Acoustic components that start at the same time are unlikely to arise from dif- 
ferent sound sources and so tend to be grouped together into a single event. Onset 
asynchronies between components on the order of as little as 30 ms are sufficient to 
give the impression of two sources and to allow listeners in some cases to identify 
the sounds as separate; to get a perspective on the accuracy necessary to produce 
synchrony within this very small time window, one might note that skilled profes- 
sional musicians playing in trios (strings, winds, or recorders) have asynchronies in 
the range of 30 to 50 ms, giving a sense of playing together while allowing percep- 
tual segregation of the instruments (Rasch 1988). If musicians play in perfect syn- 
chrony, by contrast, there is a greater tendency for their sounds to fuse together and 
for the identity of each instrument or voice to be lost. These phenomena can also be 
manipulated compositionally: Huron (1993) has shown by statistical analyses that 
the voice asynchronies used by Bach in his two-part inventions were greater than 
those used in his work as a whole, suggesting an intention on the part of the com- 
poser to maximize the separation of the voices in these works. If, on the other hand, 
voices in a polyphony are synchronous, what may result is a global timbre that 
comes from the fusion of the composite — though considerable precision is needed 
to achieve such a result. 

The other main grouping principle is that sound components tend to be per- 
ceived as a single entity when they are related by a common fundamental period. 
This is particularly the case if, when the fundamental period changes, all of the com- 
ponents change in similar fashion, as would be the case in playing vibrato, or in a 
single instrument playing a legato melody. Forced vibrating systems, such as blown 
air columns (wind instruments) and bowed strings, create nearly perfect harmonic 
sounds, with a strongly fused quality and an unambiguous pitch — in contrast to the 
several audible pitches of some inharmonic, free-vibrating systems such as a struck 
gong or church bell. This harmonicity-based fusion principle has again been used 
intuitively by composers of polyphonic music: a statistical analysis of Bach's key- 
board music by Huron (1991) showed that the composer avoided harmonic inter- 
vals in proportion to the degree to which they promote tonal fusion, thus helping to 
ensure voice independence. 

An important perceptual principle is demonstrated through such fusion: if 
sounds are grouped together, the perceptual attributes that arise — such as a new 
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composite timbre — may be different from those of the individual constituent sounds, 
and may be difficult to imagine merely from looking at the score or even at a spec- 
trogram. The principle that the perceived qualities of simultaneities depend on 
grouping led Wright and Bregman (1987) to examine the role of nonsimultaneous 
voice entries in the control of musical dissonance: they argued that the dissonant ef- 
fect of an interval such as a major seventh is much reduced if the voices composing 
the interval do not enter synchronously, and similar results also apply to fusion 
based on harmonicity (see McAdams 1999). All this demonstrates the need to con- 
sider issues of sonority in the perceptual analysis of pitch structures. 

As a concrete example, Ravel's Bolero arguably represents an example of in- 
tended fusion. Up to bar 148, the main melody is played in succession by different 
instrumental soloists. But at this point it is played simultaneously by five voices on 
three types of instrument: French horn, celesta, and piccolos (Figure 8. lb); the basic 
melody is played by the French horn, and is transposed to the octave, 12th, double 
octave, and double octave plus a major third for the celesta (LH), piccolo (2), celesta 
(RH), and piccolo (1), respectively. Note that this forms a harmonic series and that 
these harmonic intervals are maintained since the transpositions are exact (so that 
the fundamental, octave, and double octave melodies are played in C major, the 12th 
melody in G major, and the double octave plus a third melody in E major). Ravel thus 
respects the harmonicity principle to the letter, and since all the melodies are also 
presented in strict synchrony, the resulting fusion — with the individual instrument 
identities subsumed into a single new composite timbre — depends only on accurate 
tuning and timing being maintained by the performers. This procedure is repeated 
by Ravel for various other instrumental combinations in the course of the piece, the 
consequent timbral evolution contributing to the global crescendo. 

An inverse example can be found in the mixed instrumental and electroacoustic 
work Archipelago by Roger Reynolds, for ensemble and four-channel computer- 
generated tape. In the tape part, recordings of the musical materials used elsewhere 
in the work by different instruments were analyzed by computer and resynthesized 
with modifications. In particular, the even and odd harmonics were either processed 
together as in the original sound, giving a temporally extended resynthesis of the 
same instrument timbre, or processed separately with independent vibratos and spa- 
tial trajectories, resulting in a perceptual fission into two new sounds. Selecting only 
the odd harmonics of an instrument sound leaves the pitch the same, but makes the 
timbre more "hollow" sounding, moving in the direction of a clarinet sound (which 
has weak even-numbered harmonics in the lower part of its frequency spectrum); 
selecting only the even harmonics produces an octave jump in pitch, since a series 
of even harmonics is the same as a harmonic series an octave higher. The perceptual 
result is therefore two new sounds with pitches an octave apart and timbres that are 
also different compared to the original sound. An example from Archipelago is the 
split of an oboe sound (Figure 8.17), which results in a clarinetlike sound at the orig- 
inal pitch and a soprano-like sound an octave higher. When the vibrato patterns are 
made coherent again, the sound fuses back into the original oboe. 

Sequential sound organization concerns the integration of successive events 
into auditory streams and the segregation of streams that appear to come from dif- 
ferent sources. In real-world settings, a stream generally constitutes a series of events 



Analyzing Musical Sound 187 



4500 



4000 



3500 



^ 3000 

I 

c2500 



^ 2000 - 



1500 - 



1000 



500 



T???yy>AAMAAftrvv><r*rrr 



~4***«w<*^^ 



1 i : i : 

,v;,'. .....; 



— i — wiftMiUWUV ^^ ■■ ' ■ 



~>-~»^«^M/wywivwwywi/wi/wwvv 



~~~v\Mft/VWVW\AWWIANWVfVVWV^^ 



mvs^MMHA 



-"■-■-vvvvVWWWWWWWWWWvv- - 



■ < M i ri vwyw^ u y^yywy iw wwvwwwHA^ w wwww v ^tfM V MWWtfWWWWM MWW ft^- 



« HW rt gMMH W MW m i W tf m illl H Ii 



OL- 






IO 



Time (s) 



Figure 8.17. Splitting of an oboe sound in Roger Reynolds's Archipelago. 
At around 3 seconds the odd-numbered harmonics start to have an independent 
vibrato which grows and then decays in strength. A similar pattern occurs on the 
even-numbered harmonics from about 5 seconds. Finally, each group swells in 
vibrato, but with independent vibration patterns at around 9 seconds. 



emitted over time by a single source. As we will see, however, there are limits to what 
a listener can hear as an auditory stream, which does not always correspond to what 
real physical sources can actually do. So we can say that an auditory stream is a co- 
herent "mental" representation of a succession of events. The main principle that af- 
fects this mental coherence is a trade-off between the temporal proximity of succes- 
sive sound events and their relative similarity: the brain seems to prefer to connect 
a succession of sounds of similar quality which together create a perceptual conti- 
nuity. Continuity, however, is relative since a given difference between successive 
events may be perceived as continuous at slow tempi, but will be split into different 
streams at fast tempi. The main parameters affecting continuity include spectrotem- 
poral properties, sound level, and spatial position; continuous variation in all of 
these parameters gives a single stream, whereas rapid variation (particularly in all 
three together, as would often be the case for two independent sound sources play- 
ing at the same time) can induce the fission of a physical sequence of notes into two 
streams, one corresponding to the sequence emitted by each individual instrument. 
In order to illustrate the basic principles, let us examine spectrotemporal conti- 
nuity, which is affected by pitch and timbre change between successive notes. 
Melodies played by a single instrument with steps and small skips tend to be heard 
as unified, with easily detectable pitch and rhythmic intervals, while rapid jumps 
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across registers or between instruments may give rise to the perception of two or 
more melodies being played simultaneously as illustrated in Figure 8.18 (an excerpt 
from a recorder piece by Telemann). In this case the perceived "melody" (i.e., the 
specific pattern of pitch and rhythmic intervals) corresponds to the relationships 
among the notes that have been grouped into a single stream or into multiple 
streams. Over the first six seconds of the excerpt shown, a listener will hear (and the 
spectrogram shows) a relatively slow ascending melody, a static pedal note, and a se- 
quence of more rapid three-note descending motifs. Because listeners often have 
great difficulty perceiving relationships across streams, such as rhythmic intervals 
and even relative temporal order of events, there can be some surprising rhythmic 
results from apparently simple materials. The example in Figure 8.19 illustrates how 
two interleaved, isochronous rhythms played by separate xylophone players can 
produce a complex rhythmic pattern (note the irregular spacing of the sound events 
in the 250 to 1,000 Hz range in the spectrogram) due to the way the notes from the 
two players are combined perceptually into a single stream with unpredictable dis- 
continuities in the melodic contour. 

Once the acoustic waveform has been analyzed into separate source-related 
events, the auditory features of the events can be extracted. These musical qualities 
depend on various acoustic properties of the events, of which the most important 
are spatial location, loudness, pitch, and timbre. Each of these will be considered 
separately. 
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Figure 8.18. Spectrogram of an excerpt of a recorder piece by Telemann. The funda- 
mental frequencies of the recorder notes lie in the range 400 to 1,500 Hz. 
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Figure 8.19. Spectrogram of a rhythm played by separate players on an African 
xylophone. 



The spatial location of an event depends on several kinds of cues in the sound. 
In the first place, since we have two ears that are separated in space, the sound that 
arrives at the two ear drums depends on the position of the sound source relative to 
the listener's head, and is different for each ear: a sound coming from one side is both 
more intense and arrives earlier at the closer ear. Also the convoluted, irregular 
shape of the outer part of the ear (the pinna) creates position-dependent modifica- 
tions of the sound entering the ear canal, and these are interpreted by the brain as 
cues for localization. Second, and more difficult to research, are the cues that allow 
us to infer the distance of the source (Grantham 1995). There are several possible 
acoustic cues for distance: one is the relative level, since level decreases as a function 
of distance, while another is the relative amount of reverberated sound in the envi- 
ronment as compared with the direct sound from the source (the ratio of reverber- 
ated to direct sound increases with distance). Finally, since higher frequencies are 
more easily absorbed and/or dispersed in the atmosphere than are lower frequen- 
cies, the spectral shape of the received signal can also contribute to the impression 
of distance. Such binaural, pinna, and distance cues are useful in virtual reality dis- 
plays and in creating spatial effects in electroacoustic music. 

For simple sounds, loudness corresponds fairly directly to sound level; but for 
complex sounds, the global loudness results from a kind of summation of power 
across the whole frequency range. It is as if the brain adds together the activity in all 
of the auditory nerve fibers that are being stimulated by a musical sound to calcu- 
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late the total loudness. When several sounds are present at the same time and their 
frequency spectra overlap, a louder sound can cover up a softer sound either par- 
tially, making it even softer, or totally, making it inaudible: this process is called 
masking, and seems to be related to the neural activity of one sound swamping that 
of another. Masking may be partially responsible for the difficulty in hearing out 
inner voices when listening to polyphonies with three or more voices. Again, loud- 
ness is affected by duration: a very short staccato note (say around 50 ms) with the 
same physical intensity as a longer note (say around 500 ms) will sound softer. This 
seems to be because loudness accumulates over time, and the accumulation process 
takes time: for a long steady note, the perceived loudness stabilizes after about 
200 ms. This principle is useful in instruments that produce sustained notes over 
whose intensity no control is possible, but whose duration can be controlled. For ex- 
ample, the production of agogic accents on the organ is obtained by playing certain 
notes slightly longer than their neighboring notes. 

For any harmonic or periodic sound, the main pitch heard corresponds to the 
fundamental frequency, though this perceived pitch is the result of a perceptual syn- 
thesis of the acoustic information, rather than the analytic perception of the fre- 
quency component corresponding to the fundamental. (One can listen to a low- 
register instrument playing in the bottom of its tessitura over a small transistor radio 
and still hear the melody being played at the correct pitch, even though the spec- 
trum of the signal shows that all of the lower-order harmonics are missing due to the 
very small size of the loudspeaker in the radio.) But many musical sound sources 
that are not purely harmonic (including carillon bells, tubular bells, and various per- 
cussion instruments) still give at least a vague impression of pitchedness; it seems 
that pitch perception is not an all-or-none affair, so that perceived pitch can be more 
or less strong or salient. For example, try singing a tune just by whispering and not 
using your vocal chords: you will find that you change the vowel you are singing to 
produce the pitch, which suggests that a noise sound with a prominent resonance 
peak can produce enough of a pitch percept to specify recognizable pitch relations 
between adjacent sound events. Similarly the sound processing techniques used in 
electroacoustic and pop music can create spectral modifications of broadband noise 
(such as crowd or ocean sounds) with a regular series of peaks and dips in the spec- 
trum: if the spacing between the centers of the noise peaks corresponds to a har- 
monic series, a weak pitch is heard, allowing musicians to "tune" noise sounds to 
more clearly pitched harmonic sounds. 

Finally, timbre is a vague term that is used differently by different people and even 
according to the context. The "official" scientific definition is a nondefinition: the at- 
tribute of auditory sensation that distinguishes two sounds that are otherwise equal 
in terms of pitch, duration, and loudness, and that are presented under similar con- 
ditions (presumably taking into account room effects and so on). That leaves a lot of 
room for variation! Over the last 30 years, however, a new approach to timbre per- 
ception has developed, which allows psychoacousticians to characterize more sys- 
tematically what timbre is, rather than what it is not. Using special data analysis 
techniques, called multidimensional scaling (Plomp 1970, Grey 1977, McAdams et 
al. 1995), researchers have been able to identify a number of perceptual dimensions 
that constitute timbre, allowing a kind of deconstruction of this global category into 
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more precise elements. Attributes in terms of which timbres may be distinguished 
include the following: 

• Spectral centroid (visible in a spectral representation: the relative weight of 
high and low-frequency parts of the spectrum, a higher centroid giving a 
"brighter" sound). 

• Attack quality (visible in a temporal representation, and including the attack 
time and the presence of attack transients at the beginning of a sound). 

• Smoothness of the spectral envelope (visible in a spectral representation: the 
clarinet has strong odd-numbered harmonics and weak even-numbered har- 
monics, giving a ragged spectral envelope). 

• Degree of evolution of the spectral envelope over the course of a note (visible 
in a time-frequency representation: some instruments, like the clarinet, have a 
fairly steady envelope, whereas others have an envelope that opens up toward 
the high frequencies as the intensity of the note increases, as in the case of 
brass instruments). 

• Roughness (visible in a temporal representation: smooth sounds have very 
little beating and fluctuation, whereas rough sounds are more grating and 
inherently dissonant). 

• Noisiness/inharmonicity (visible in a spectral representation: nearly pure har- 
monic sounds, like blown and bowed instruments, can be distinguished from 
inharmonic sounds like tubular bells and steel drums, or from clearly noisy 
sounds such as those of crash cymbals and snare drums). 

A greater understanding of the relative importance of these different "dimensions" of 
timbre may help musicologists develop systematic classification systems for musical 
instruments and even sound effects or electroacoustic sounds. 

Analytical Application 

As an example of the way in which psychoacoustical principles can be empirically 
applied to the analysis of pitch and timbre, Tsang (2002) uses a number of percep- 
tually based approaches to analyze the structure of Farben — the third of Schoen- 
berg's Five Orchestral Pieces, Op. 1 6, which is celebrated for its innovative use of or- 
chestral timbre. Taking principles developed by Parncutt (1989) for estimating the 
salience of individual pitches, and by Huron (2001) for explaining voice-leading in 
perceptual terms, Tsang discusses the perceptibility of the canonic structure of the 
opening section of Farben. By applying Parncutt's pitch salience algorithm (a formula 
used to calculate how noticeable any given pitch is in the context of other simulta- 
neous pitches), Tsang concludes that "Schoenberg's choice of pitches ensures that 
relatively strong harmonic components often draw the listener's attention to the 
canonic voices that are moving or are about to move" (Tsang 2002: 29). Huron's 
perceptual principles relating to voice-leading, which take into account a wider va- 
riety of psychoacoustical considerations than pitch alone, partially support this con- 
clusion, suggesting that Schoenberg tailored his choices of orchestration so as to 
bring out the canonic movement, but that other factors serve to disguise the canon. 
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At the end of his study, Tsang notes that the different attentional strategies lis- 
teners bring to bear on the music will inevitably result in different perceptual expe- 
riences, as will comparatively slight differences of interpretation on the part of con- 
ductors and orchestras — particularly in a piece that seems to place itself deliberately 
at the threshold of perceptual discriminability These considerations suggest a high 
level of indeterminacy between what a perceptually informed analysis might suggest 
and what any particular listener may experience — an indeterminacy that would be 
damaging to a narrowly descriptive (let alone rigidly prescriptive) notion of the re- 
lationship between analysis and experience. But as many authors have pointed out 
(e.g., McAdams 1984, Cook 1990), to propose such a tight linkage is neither neces- 
sary nor even desirable. 

A further example of an attempt to relate perceptual principles to musicologi- 
cal concerns is provided by Huron (2001). The goals of this ambitious paper are "to 
explain voice-leading practice by using perceptual principles, predominantly prin- 
ciples associated with the theory of auditory stream segregation . . .", and "to iden- 
tify the goals of voice-leading, to show how following traditional voice-leading rules 
contributes to the achievement of these goals, and to propose a cognitive explana- 
tion for why the goals might be deemed worthwhile in the first place" (Huron 2001: 
2-3). As this makes clear, perceptual principles are being used here to address not 
only matters of compositional practice, but also aesthetic issues. The form of the 
paper is first to present a review of accepted rules of voice-leading for Western art 
music; second to identify a number of pertinent perceptual principles; third to see 
whether the rules of voice-leading can be derived from the perceptual principles; 
fourth to introduce a number of auxiliary perceptual principles which provide a per- 
spective on different musical genres; and finally to consider the possible aesthetic 
motivations for the compositional practices that are commonly found in Western 
music and which do not always simply adhere to the perceptual principles that 
Huron identifies. 

Huron makes use of six perceptual principles in the central part of the paper, 
each of which is supported with extensive empirical evidence from auditory and 
music perception research, and is shown to correspond to compositional practice 
often sampled over quite substantial bodies of musical repertoire (using Huron's 
Humdrum software — see chapter 6, this volume). To give some idea of what the per- 
ceptual principles are like, and how they are used to derive voice-leading rules, con- 
sider as an example the third perceptual principle, which Huron calls the "minimum 
masking principle": "In order to minimize auditory masking within some vertical 
sonority, approximately equivalent amounts of spectral energy should fall in each 
critical band. For typical complex harmonic tones, this generally means that simul- 
taneously sounding notes should be more widely spaced as the register descends." 
(Huron 2001: 18) 

In support of this principle, Huron assembles a considerable amount of evi- 
dence from well-established psychoacoustical research dating back to the 1960s 
showing that pitches falling within a certain range of one another (i.e., the "critical 
band," which roughly corresponds to the bandwidth of the auditory filters in the 
cochlea) both tend to obscure ("mask") one another, and interact to create a sense of 
instability or roughness (often referred to as "sensory dissonance"). When Huron 
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goes on to derive perceptually-based voice-leading rules, the "minimum masking 
principle" is used to motivate two rules, one traditional, and one which Huron calls 
"nontraditional"; "nontraditional" rules are those that seem to follow from percep- 
tual principles, but are not acknowledged as explicit voice-leading rules in standard 
texts. The traditional rule is stated as follows (Huron 2001: 33): "Chord Spacing 
Rule. In general, chordal tones should be spaced with wider intervals between the lower 
voices." The nontraditional rule (ibid.) is: "Tessitura-Sensitive Spacing Rule. It is more 
important to have large intervals separating the lower voices in the case of sonorities that 
are lower in overall pitch." Huron refers to work showing that this rule, although 
not explicitly recognized in standard texts on voice-leading, is adhered to in musi- 
cal practice. The five other perceptual principles work in a similar fashion, generat- 
ing individually, and in combination with one another, a total of 22 voice-leading 
rules, of which nine are traditional. All of the 13 nontraditional rules are found to be 
supported by compositional practice, demonstrating rather convincingly how a per- 
ceptually-based approach can reveal implicit, but previously unrecognized, compo- 
sitional principles. 

Having derived the voice-leading principles, Huron introduces four additional 
perceptual principles which are used to address questions of musical genre. Again, 
to get a flavor of what is involved, consider just one of the four (Huron, 2001: 49): 
"Timbral Differentiation Principle. If a composer intends to write music in which the 
parts have a high degree of perceptual independence, then each part should maintain a 
unique timbral character." The striking thing about this principle, as Huron points 
out, is the extent to which it is ignored in compositional practice. Although wind 
quintets and other small mixed chamber ensembles show significant differentiation, 
string quartets, brass ensembles, madrigal groups, and keyboards all make use of 
timbrally undifferentiated textures for polyphonic purposes. Why is this? Huron 
suggests that there may be a number of factors. One is pragmatic: it may simply have 
been more difficult for composers to assemble heterogeneous instrumental groups, 
and so the goal of distinguishable polyphony was bracketed or abandoned in favor 
of practical possibility. A second reason may be the operation of a contrary aesthetic 
goal: Huron suggests that composers tend to prefer instrumental ensembles that 
show a high degree of "blend," and homogeneous instrumentation may be one way 
to achieve this. And a third reason may be balance: it is much harder to achieve an 
acceptable balance between voices in a very diverse instrumental group, and com- 
posers may have decided that this was a more important goal. 

It is interesting in this regard that Schoenberg's practice, from the middle period 
of his atonal style, of indicating instrumental parts as "Haupstimme" and "Nebens- 
timme" (main voice and subsidiary voice) was motivated by a concern that the cor- 
rect balance between instrumental parts in his chamber and orchestral works might 
not be attained; given the dramatic explosion of writing for mixed chamber ensem- 
bles in the twentieth century, influenced strongly by the ensemble that Schoenberg 
used in Pierrot Lunaire, this does seem to be a recognition of the balance problem 
that Huron identifies. Equally, the striking way in which Webern uses timbral dif- 
ferentiation to cut across the serial structure in the first movement of the Symphony, 
Op. 21, provides support (through counterevidence) for the timbral differentiation 
principle: timbral identity and differentiation disguise the serial structure, instead 
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superimposing a different structure that is articulated by timbre itself — a timbral 
palindrome. 

The Webern example already demonstrates the complex interaction between 
compositional and aesthetic goals on the one hand, and perceptual principles on the 
other — and it is this subject that the final part of Huron's paper addresses. Huron 
proposes that achieving perceptual clarity in perceptually challenging contexts (e.g., 
finding hidden objects in visual arrays, as exploited in many children's puzzles) is an 
intrinsically pleasurable and rewarding process, and that this is one way to under- 
stand why voice-leading rules and compositional practice both conform to, and flout, 
perceptual principles. If all music simply adhered to perceptual imperatives, then 
there would be little motivation to move beyond the most straightforward mon- 
ophony. But social considerations (the need, or desire, to develop musical styles in 
which groups of singers or instrumentalists with different pitch ranges, timbral qual- 
ities or dynamic characteristics can sing and play together), aesthetic goals and a 
whole range of other factors have resulted in the historical development of an enor- 
mous variety of textures and styles. One strand within this, Huron suggests, is the 
possibility that some multipart music is organized deliberately to challenge the lis- 
tener's perceptual capacities — precisely because of the pleasure that can be gained 
from successfully resolving these complex textures. 

Early Renaissance polyphonists discovered [that] ... by challenging the 
listener's auditory parsing abilities, the potential for a pleasing effect could 
be heightened. However, this heightened pleasure would be possible only if 
listeners could indeed successfully parse the more complex auditory scenes. 
Having increased the perceptual challenge, composers would need to take 
care in providing adequate streaming cues. Following the rules of voice- 
leading and limiting the density of parts . . . might be essential aids for 
listeners. (Huron 2001: 57) 

What is striking about this discussion is that it brings together perceptual prin- 
ciples based on extensive empirical support, aesthetic considerations, and a rather 
different perspective on music history in a way that manages to avoid the potential 
pitfalls of a perceptual determinism. Huron's final paragraph is significant for the 
care with which it recognizes that perceptual principles act neither as the arbiters of 
musical value, nor as constraints on future creativity. Noting that his interpretation 
of the aesthetic origins of voice-leading "should in no way be construed as evidence 
for the superiority of polyphonic music over other types of music," he continues 
(Huron 2001: 58): 

In the first instance, different genres might manifest different perceptual goals 
that evoke aesthetic pleasure in other ways. Nor should we expect pleasure 
to be limited only to perceptual phenomena. As we have already emphasized, 
the construction of a musical work may be influenced by innumerable goals, 
from social and cultural goals to financial and idiomatic goals. . . . The identi- 
fication of perceptual mechanisms need not hamstring musical creativity. On 
the contrary, it may be that the overt identification of the operative perceptual 
principles may spur creative endeavors to generate unprecedented musical 
genres. 
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Conclusion 

In this chapter, we have tried to show how acoustical and perceptual analyses can 
supplement and "animate" an analytical understanding drawn from scores and 
recordings in important ways. The representations and analytical methods consid- 
ered here can be a significant key to a more sound-based understanding of music 
than the score-reading approach has traditionally encouraged, and in this way at- 
tributes of musical structure and process that remain hidden from view in the score, 
and that often pass by too rapidly and with too much complexity in performance or 
recording, can be brought to light and given appropriate consideration. But as we 
have seen, there are still problems to be overcome: as the figures in this chapter 
demonstrate, representations of sound often contain large amounts of information, 
with the result that it can be difficult to strike an effective balance between analyz- 
ing musically appropriate stretches of material and risking information overload on 
the one hand, and focusing on a frustratingly tiny fragment of music on the other. 
Although the analysis of musical sound is still in early stages of development, and 
more powerful summarizing tools will no doubt be developed in the future, in the 
long run there may be no easy solution to a problem which is a testimony to the ex- 
ceptional power of human perception. 

As the example from Tsang (2002) has already made clear, the kind of approach 
discussed in this chapter is necessarily generic, and unable to explain individual lis- 
tening experiences — even if it brings new tools with which to illustrate those expe- 
riences. It is, after all, based on a "culture-free" approach in which the salience and 
impact of events, for example, is based solely on their acoustical and perceptual 
properties and not on their cultural resonances or semiotic significance. A well- 
established finding in the psychology of perception, sometimes referred to as the 
"cocktail party phenomenon," demonstrates that when people are attending to mul- 
tiple sound sources, a source that has special significance for them (such as their 
name, or an emotionally charged word) will catch their attention even when it com- 
petes with other sound sources that may be considerably more salient (louder, 
nearer, timbrally more prominent). Thus the analyst is in no measure freed by this 
wealth of empirical data from either the responsibility or the opportunity to explore, 
and try to explain, why music might be heard or understood in particular ways. 
Nonetheless, acoustical and perceptual analyses can usefully complement more cul- 
turally oriented approaches by providing a rich source of information on which the 
latter might be based, and in terms of which they are certainly grounded. As with 
any empirical approach, the value of such an outlook is not in the data that it may 
accumulate but in the way in which data rub up against theory — formal or infor- 
mal — in ways that may be supportive and confirmatory, or uncomfortable and mind 
changing. 
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CHAPTER 9 



Data Collection, Experimental Design, 
and Statistics in Musical Research 

W. Luke Windsor 

Introduction 

This chapter provides a brief introduction to the ways in which musical research has 
drawn upon the quantitative methods of the empirical social sciences. The past 25 
years have seen increasing moves toward the use of such methods in musical re- 
search, especially in the domain that has become variously known as "music psy- 
chology" "psychology of music," "music cognition," "music perception," or even 
"psychomusicology." Although these methods can be applied directly to musical 
data derived from a score, this chapter focuses upon quantitative analytical tech- 
niques that can be applied to musical events that involve either listeners or per- 
formers. Research on music perception and performance can be carried out using 
quite standard statistical and experimental methods, but often requires novel ap- 
proaches to their application. 

In carrying out an empirical study, hypotheses, or at least some concrete re- 
search questions, must be generated before comparing, describing, coding, or col- 
lecting data. This is because it is only in the light of such hypotheses that you can 
decide precisely what data are relevant, and how irrelevant factors are to be ex- 
cluded: to this extent the approach must be top-down, rather than bottom-up. 
Hence, although the first practical step in doing empirical research is observation, a 
prior conceptual step should be an informed decision about what to observe, how to 
quantify it, and how to analyze the resulting data. It is pointless collecting data that 
turn out to be inappropriate for analysis, or which fail to provide evidence that can 
be used to support or challenge the relevant arguments. 

However, not all quantitative research need be experimental in this classical 
sense. It is perfectly acceptable to collect data in a more exploratory manner as long 
as it is recognized that it may be hard to understand the relationship between dif- 
ferent variables. The "real world" is a complex place, and laboratory researchers 
often pay a price for ensuring that their experimental results are easy to interpret. 
This price is loss of "realism" or "ecological validity," and can result in findings that 
only hold under extremely unusual and constrained circumstances (such as those 
within a laboratory). It may be convenient for analytical purposes to take into ac- 
count only certain things, such as, for example, the duration and pitch-class of 
events in melodic sequences, but there is a danger of finding out too late that some 
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other factor, such as melodic contour, was a relevant variable. An experimental ap- 
proach tends to be reductive, in that it reduces the number of factors involved so as 
to show more clearly their influence on one another. There is always a danger that 
such a reductive approach changes the observed phenomenon so much that the 
findings are hard to apply to the real world. 

This chapter is organized so as to mirror the different stages a piece of empiri- 
cal research might involve. First, I consider issues in the collection of data: this sec- 
tion summarizes the different kinds of quantitative data that can be gathered from 
performers and listeners, and suggests appropriate methods of data collection. The 
second section explores various methods of organizing and transforming data. It is 
often the case that data are noisy, or in the wrong format for a particular kind of 
analysis: they may need to be systematically filtered, or encoded. The third section 
outlines some basic statistical techniques for describing and summarizing quantita- 
tive data, including visual and other methods for representing data in such a way as 
to draw general conclusions. The fourth section explores the discovery and mea- 
surement of trends in data, with examples of methods that allow the comparison of 
two or more sets of data. The final section returns to issues of experimental method 
in relation to hypothesis testing, compared to less strict and more exploratory ap- 
proaches to empirical research. 



Collecting Data from Listeners and Performers 

It seems self-evident that a sensible way of learning about music might be to observe 
and measure the behavior of performers and listeners. In the classical Western tra- 
dition, for example, expert listeners (themselves often expert performers) must se- 
lect the winners in prestigious competitions, and teachers of music must diagnose 
problems in listening and performance. Musicians of all kinds must self-diagnose 
their shortcomings in order to improve, and must be able to apply sophisticated per- 
ceptual and cognitive skills to succeed in coordinating their ensemble performances. 
Some forms of observation and measurement are judgmental, while some are rather 
less value-laden, but it would be quite wrong to assume that an empirical attitude to 
musical behavior is something alien to everyday musical life. 

The direct collection of quantitative data, and the quantification of more quali- 
tative observations, is familiar to any musician who works within a structured edu- 
cational context. It is common for performers to be assessed by examinations that 
not only provide written feedback, but also a breakdown of numerical marks in dif- 
ferent areas of performance (such as the ability to play scales and arpeggios, pieces, 
and to sight-read). The total number of marks acquired in such an examination is 
intended to express their standard of performance at a particular level: a qualitative 
assessment of performance becomes quantitative, and judgments are then made 
about the relative success of different candidates. In a more subtle sense, audition 
panels and competition juries regularly translate their qualitative assessments of per- 
formers into rankings, which also represents a move away from qualitative toward 
quantitative empiricism. Such ratings and rankings of musical skill are just one way 
in which quantitative data might be collected from musicians. 
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Before moving on to a consideration of different types of musical data, a basic 
distinction can be made between more or less direct methods of collection. For ex- 
ample, if one imagines a continuum between observing a pianist playing and asking 
the pianist to fill out a questionnaire about their performance, it is easy to see that the 
former might represent a more direct method of collecting data about the perform- 
ance itself. The questionnaire might be a direct way of gathering data on the pianist's 
motivations, but hardly on his or her performance as such, since there are intervening 
effects of memory and interpretation — which may be interesting in themselves, but 
which represent a considerable move away from direct measurement and observa- 
tion. While the relative directness of methods will change with context (measuring a 
performance is a very indirect way of discovering which edition of the music has been 
played as compared to asking the performer), the number of stages of interpretation 
through which information must pass before analysis should always be considered. 
This chain of interpretation must be taken into account when analyzing data: each 
stage introduces uncertainty — which may be necessary, but must be noted. If a study 
of performance was to be based on the judgments of listeners, it would have to take 
into account the possible biases and unreliability of the judges, whereas a study based 
on Musical Instrument Digital Interface (MIDI) data gathered from performances need 
not do so. Of course, the latter would say nothing about whether anything measured 
had any relevance to listeners: directness is no guarantee of appropriate design. 

Both direct and indirect methods of measurement have been described in the 
preceding chapters. Many studies of performance have more or less directly mea- 
sured the timing and magnitude of musicians' movements through analysis of video 
data (e.g., Davidson 1993). Timing and intensity data from performances have been 
directly gathered from either specially modified conventional instruments (e.g. Sea- 
shore 1938; Shaffer 1981), or MIDI instruments (e.g., Palmer 1989), or from analy- 
sis of audio signals (e.g., Repp 1992). Digital signal processors also make available a 
wealth of detailed data about the internal structure of sounds, useful in the analysis 
of instrumental or vocal timbre (see chapter 8 , this volume) . Although most of these 
techniques are primarily suited to the analysis of performance, perceptual issues can 
also be studied, as long as the indirect nature of the response is taken into account. 
A listener might be asked to imitate a musical sequence (Clarke 1993), or to tap 
along to it (Repp 1999), so providing data relevant to the study of perception. More 
direct methods of collecting data about performance might include the measure- 
ment or observation of changes in physiological state; for example, common mea- 
sures used in studies of performance anxiety include heart rate, blood pressure and 
skin conductivity (Abel and Larkin 1990). Again, both perception and production 
studies have begun to measure the spatial location and temporal pattern of brain ac- 
tivity, whether measured in terms of temperature, electrical activity or blood flow 
(see, for example, Besson and Faita 1995). 

Rather more indirect methods may also be appropriate, however. When study- 
ing perceptual phenomena, it may be most practical to collect the responses to some 
form of listening test, where listeners have to make their response either on paper or 
via computer software. For example, a study might investigate how similar a listener 
thinks two musical events are (e.g., Grey 1977), or how well one event fits with an- 
other (e.g., Krumhansl and Kessler 1982): here the data might take the form of num- 
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bers corresponding to the magnitude of the perceived similarity or goodness of fit 
along some scale. Similarly, a task might require subjects to identify an event, or dis- 
criminate between different events (e.g., Windsor 1993), and again the data can be 
represented quantitatively. In a different context, more complex psychometric tests 
might be employed, such as those that purport to assess musical ability (for instance 
the Seashore Measures of Musical Talents), standard of performance (such as the As- 
sociated Board of the Royal Schools of Music examinations in the U.K.), or anxiety 
levels (the State -Trait Anxiety Inventory), all of which give a numerical score or set 
of scores. Using preexisting tests has advantages and disadvantages: such tests are 
often well standardized for particular populations, but just for this reason they may 
be misleading if applied outside this group of people. Other measures of training or 
ability might also be made: simply counting the number of years of musical in- 
volvement can be used as a rough measure of certain types of musical skill, for 
example, especially since there is now strong empirical evidence for a direct rela- 
tionship between the quantity of practice and resulting musical expertise (Sloboda, 
Davidson, Howe, and Moore, 1996). 

One approach which I will do no more than touch on (but which is discussed 
in chapter 4, this volume) is the questionnaire study. Although rather indirect and 
sometimes difficult to verify, questionnaires can be an extremely good source of cer- 
tain kinds of quantitative data, such as that deriving from self-assessment of the time 
spent practicing (Sloboda, Davidson, Howe, and Moore 1996), or from self-coding 
of attitudes or states of mind. The standard text on formulating questionnaires is Op- 
penheim (1966); although now quite old, it contains excellent advice on how to find 
out about people's beliefs and attitudes through questionnaires. 

Data Types and Variables 

Before deciding upon a method of collecting data and investing in the equipment 
and software needed, it is necessary to make an informed choice about what kind of 
data to collect, and in which form it would most usefully be encoded. It is also im- 
portant at this stage to take account of the statistical consequences of one's choices. 
Different types of data require and allow for different types of statistical test, and an 
attempt will be made here to outline some of the consequences of choosing partic- 
ular types of quantitative variable. Sometimes the same thing can be measured in dif- 
ferent ways, and the choice may be influenced by the kinds of intended analysis. 

A primary distinction is between three types of data: nominal, or category data; 
ordinal data; and continuous data. Nominal data are differentiated only in name, not in 
magnitude, and each data point, or observation, is represented by one of a number 
of symbols. A simple example of this might be the categorization of listeners in a study 
into musicians and nonmusicians, perhaps using a questionnaire. The possibility that 
there might be a continuum between musicians and nonmusicians, and that their de- 
gree of musicianship might be quantifiable, is disregarded in this nominal measure — 
an approach that may or may not be justified according to the circumstances of the 
individual study. Some categorical distinctions seem more "natural" than others, 
such as that between male and female, and therefore more appropriate for nominal 
representation. However, even that might not always be appropriate: a study that as- 
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sessed listeners' masculinity might need to allow for intermediate values between 
male and female and a way of expressing their relative magnitude. 

Table 9.1 shows some imaginary data collected from 20 listeners. There are two 
variables, listener and training. The former is nominal, and has 20 categories, one 
for each listener. The latter is represented in three different ways in the table, in the 
first instance as two (nominal) categories. Note that although the listeners have been 
categorized using numerals, the magnitude of these numerals is meaningless since 
this variable is nominal. However, the data still lend themselves to certain kinds of 
quantitative analysis: for instance, measuring the number of musicians in the group, 
their frequency , and comparing this with the number (frequency) of nonmusicians. 
By measuring and comparing frequencies one can create quite sensitive and inform- 
ative analyses: some simple examples are covered later in this chapter. Frequency 
data are often derived from questionnaires, where each question has a number of 
choices such as "yes," "no," and "not sure," or questions such as "are you male or fe- 
male?," where only certain alternatives are available. 

Ordinal data are differentiated from nominal data in that they include a notion 
of rank order. Instead of asking a group of subjects whether they are musically 



Table 9.1. Imaginary data on the level of training of 20 listeners, presented in three 
different ways. 





Training 


Training 


Training 




(Nominal — 


(Ordinal- 


(Continuous — 


Listener 


two categories) 


three levels) 


number of years) 


1 


musician 


professional 


20 


2 


non-musician 


untrained 


2 


3 


musician 


amateur 


18 


4 


musician 


amateur 


12 


5 


non-musician 


untrained 


1 


6 


non-musician 


untrained 


3 


7 


musician 


amateur 


5 


8 


musician 


professional 


13 


9 


non-musician 


untrained 





10 


musician 


amateur 


10 


11 


musician 


amateur 


12 


12 


non-musician 


untrained 


1 


13 


non-musician 


untrained 


2 


14 


musician 


professional 


40 


15 


musician 


professional 


28 


16 


musician 


professional 


25 


17 


musician 


professional 


30 


18 


musician 


amateur 


35 


19 


non-musician 


untrained 


5 


20 


musician 


professional 


32 
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trained or not, they might be asked to rate their expertise using three categories: "un- 
trained," "amateur," and "professional," as shown in the second column in Table 9.1. 
An ordinal coding assumes that the levels of expertise among these three groups are 
not only different in kind, but can also be rank ordered. If it is assumed (though this 
assumption may be incorrect, see below) that untrained subjects have the lowest 
level of expertise, amateurs a moderate level, and professionals the highest, then it 
is straightforward enough to translate these categorical variables into the levels of an 
ordinal variable. Ordered from smallest to largest the three levels of this variable are 
"untrained," "amateur," and "professional": the variable now contains the notion of 
order, unlike a nominal variable where each level is merely qualitatively different 
from the others. 

Nominal and ordinal data, and their associated statistical analyses, are rarely 
seen in empirical research applied to music. Instead, most studies collect and ana- 
lyze continuous data. Rather than asking someone to categorize herself into one of a 
number of ordinally differentiated levels, or measuring gross differences in perform- 
ance on some task, it is more commonplace to measure or categorize behavior along 
a continuum. Time, distance, and speed are common examples of continuous vari- 
ables. In order to measure training or expertise along a continuous scale, subjects 
might be asked how long they have been playing an instrument (as in the third col- 
umn of Table 9. 1), or how many hours per week they spend practicing. The variable 
now captures not only order and magnitude, but also the relative distance between 
data points. A performer with eight years of professional experience has twice as 
much professional playing time as one with four years of experience; one with six 
years falls midway between the other two. Continuous variables allow for some 
subtle and sensitive statistical tests, which is why they are generally preferred. How- 
ever, close perusal of Table 9.1 shows that continuous data can have disadvan- 
tages — it might be helpful to be able to distinguish between professional and ama- 
teur musicians who may have equivalent years of experience. 

One final distinction can be made between variable types, which is vital when 
analyzing data. Statistical tests are generally divided into two categories: parametric, 
so called because they are designed to deal with data that come from a continuous 
parameter or distribution; and nonparametric, which make no such assumption. 
Parametric tests on the whole are more sensitive and give more detailed results. The 
simplest way of determining whether a parametric test is appropriate is to ask the 
following two questions: (1) are the data continuous, and (2) do the data resemble 
a normal distribution? The answer to question (1) is covered above, although some 
researchers differ on whether they regard nonstandard measurements (other than 
time, distance, velocity, mass, and so on) to be truly continuous. The answer to (2) 
is more complex and will be addressed below on p. 207. 

Measurement and Observation 

The most uncontroversial way of collecting data is by direct measurement of what a 
listener or performer does while listening or performing. Provided that that the tools 
for making measurements are accurate and correctly calibrated, such an approach 
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helps avoid bias or observer-induced variability, and tends to produce data that can 
be analyzed using parametric statistical tests. This does not remove all doubt, how- 
ever, since even the most accurately measured aspects of human performance may 
turn out not to bear upon the questions asked. 

A clear example of such direct measurement, which I have already mentioned, 
is the long tradition of collecting and analyzing performance data from the mecha- 
nism of pianos. Seashore (1938) describes a large body of work from the early part 
of the last century, which has been continued and extended by researchers such as 
Shaffer (e.g., 1981) and Palmer (e.g., 1989). Such techniques normally allow re- 
searchers to collect timing and intensity data (see chapter 5, this volume). 

It is also possible to collect indirect data about performance by observing or 
measuring listeners' responses. There are instances in which researchers observe 
and record their own responses, but it is generally preferable to use a qualified 
group of observers or listeners in order to avoid accusations of researcher bias. 
Repp (1997) used a listening panel's rank orderings of performances to derive a 
numerical index of preferences for different performances of the same piece. Simi- 
larly, Williamon (1999) used expert and novice ratings of different performances 
as a measure of listeners' preferences, and Clarke and Windsor (2000) asked lis- 
teners both to categorize performances by edition and to rate each performance 
for its aesthetic quality. Such listening panels can be thought of as a method of 
gaining either indirect data about performance, or direct data about listeners' 
responses. 

The difficulty with directly observing or measuring listeners' activity in laboratory- 
style contexts is that there may be little or no overt behavior to capture. One way round 
this is to use physiological measures (e.g., Krumhansl 1997). In social situations, how- 
ever, there may be many actions to observe and record: North and Hargreaves (1996), 
for example, recorded the number of times an advice desk was visited while different 
music was played, and even the country of origin of the wine that was purchased 
when music with different national associations was played in a supermarket (North, 
Hargreaves, and McKendrick 1999). 

In summary, by observing and measuring the actions people make when en- 
gaged in music-making or listening, it is possible to obtain accurate, and often con- 
tinuous, data that allow for considerable statistical power and flexibility when ana- 
lyzed. Before turning to methods of manipulating, representing, and analyzing such 
data, the main alternative to direct observation and measurement will be addressed. 

Testing 

There are many situations in which direct measurement or observation of behavior 
is inappropriate. Although perceptual questions may be addressed by observing 
subjects' performance on a related task, it is often more appropriate to access their 
responses by asking them to provide a written, verbal, or diagrammatic response to 
some stimulus material; this is particularly the case where subjects' attitudinal, af- 
fective, or qualitative responses to a situation or question are at issue. Established 
psychometric tests, such as those designed by Wing (1981) or Seashore (1919) to 
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assess musical aptitude or achievement, can have a useful function in providing in- 
formation against which to judge performance in more specific tasks, whether or not 
one believes that such tests have a role to play in applied educational contexts. Sim- 
ilarly, more general tests of subjects' mental states may be useful: for instance the 
State-Trait Anxiety Inventory (STAI) assessment of anxiety can be used to shed light 
on subjects' performance in a particular situation (Abel and Larkin 1990). 

One of the simplest kinds of test is that in which subjects listen to a series of 
sounds which differ in some way and choose from a limited range or number of re- 
sponses. Windsor (1993), for example, asked subjects to classify rhythms into two 
categories, and in a second experiment to decide whether a given rhythm is the same 
as the preceding one; such data can then be represented in terms of the number of 
responses in each category for each rhythm, or the number of correct responses in 
the second task. More subtle perceptual judgments can be arranged to form a con- 
tinuous scale. Grey (1977) asked subjects to rate the similarity of different instru- 
mental timbres, while Krumhansl and Kessler (1982) played subjects a set of tones 
and then asked them to rate how well a further (or "probe") tone fitted within this 
context. Both of these examples restricted subjects' responses to a scale of integers 
(common scales are between 1 and 5, or 1 and 7), with the higher number reflect- 
ing the greatest similarity, or degree of fit. Although strictly speaking such designs 
generate ordinal rather than continuous data, it is common to regard the data as con- 
tinuous, the assumption being that subjects use the entire scale and do so in a rea- 
sonably continuous fashion. It is, however, possible to obtain truly continuous data; 
Clarke and Krumhansl (1990), for example, asked subjects to indicate their re- 
sponse with a pencil mark on a continuous line that could then be measured. Such 
responses along a continuum can be used to create multidimensional analyses, 
where listeners are asked to make more than one judgment for each sound they hear: 
Juslin (1997), for example, used responses along a number of such continua to as- 
sess listeners' emotional responses to passages of music. 

Moving away from such directly perceptual tasks, but staying in the area of mul- 
tidimensional studies, Sloboda, Davidson, Howe, and Moore (1996) provide an ex- 
ample of collecting retrospective and longitudinal data: in their many studies of the 
role of practice in musical development, a battery of contrasted measures was used. 
This allowed for comparison of the many different factors that might have some ef- 
fect on instrumental success. However, such studies require careful control; it is 
often easier to interpret data from a smaller number of measurements, and if deci- 
sions can be made to facilitate this, the analysis can be less complex and provide 
much clearer results. 



Coding, Organizing, and Transforming Data 

Having collected data, it is often necessary to convert them from one format into an- 
other, and to organize and store them in a practical manner. Although this can be a 
laborious task, there are some excellent software packages that can help with this. 
This section explores these preliminary, pre-analysis stages in data-handling. 
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Recoding 



It may be necessary or convenient to notate observations in a different format from 
that required by a particular analysis. For example, subjects might be asked to indi- 
cate which of two possible performers they are hearing on a number of occasions. 
The data of interest, however, might be whether the listeners are correct or incorrect 
in their judgments. Hence, one listeners responses to 20 performances might be re- 
coded as shown in Table 9.2. A correct response is represented by 1, an incorrect 
one by 0. Although this can be achieved by hand, it is also possible to recode data 
automatically, using a formula in a spreadsheet program. 

Another instance in which considerable recoding might be needed is where 
qualitative data have been recorded, but are to be analyzed quantitatively. For ex- 
ample, a verbal transcript from an interview could be recoded such that certain 
themes, phrases, or words are assigned to categories. The frequency with which each 
category of utterance is used can then be calculated on the basis of these labels. 

A more complex problem is the recoding of data acquired by some form of in- 
strumentation. For example, MIDI data may need to be recoded to give real times in 
milliseconds, rather than the sequencer values of bars, beats, and ticks: although 



Table 9.2. Imaginary data for a listener's attempts to identify the performer (1 or 2) in 
20 performances. 

Performance Response Actual Performer Correct? 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
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some commercial sequencer packages allow this, many do not, or do not allow easy 
archiving of such data. Fortunately there are some helpful tools for such purposes. 

Some Helpful Tools 

Since almost all statistical analyses are now carried out by computer it is helpful to 
store data in machine readable form at the earliest possible stage. Many researchers 
now have subjects input their responses directly to a computer, ensuring that no er- 
rors in transcription can occur and that the data are ready for analysis as soon as col- 
lected. This is sometimes possible even without programming expertise. The soft- 
ware package MEDS, for example, allows the construction of complex experiments, 
the presentation of audio, visual, and MIDI stimuli, and the accurate recording of 
subject responses. 1 Issues of compatibility are paramount here. It wastes enormous 
amounts of time if data need to be converted from format to format, or between 
computers of different kinds. It is sensible to find a combination of data collection, 
organizational, and analysis tools that all run on the same computer system and can 
read and write the same types of file. There will inevitably be circumstances when 
some reorganization or conversion of data may be required, but this should be kept 
to a minimum, both to decrease the risk of accidental loss of data and to ensure that 
errors are not introduced by inaccurate conversion. 

For example, consider the chain of data-conversion involved in the analysis of 
moment-to-moment timing in piano performance. The pianist, seated at a MIDI 
piano such as a Yamaha Disklavier, performs an excerpt from a Chopin nocturne 
three times. The piano is connected to a laptop computer via MIDI cables and a MIDI 
interface, and MIDI data are transmitted from the mechanism of the piano to a lap- 
top. At this point there is a choice. It would be possible to use commercial software 
consisting of a sequencer package, a spreadsheet, and a statistics package, but the al- 
ternative is to use a specialist application which transforms the MIDI data into a for- 
mat which is directly readable by a statistics package. 

The advantage of using commercial software is that flexibility is ensured at all 
stages of the process; the sequencer package is designed for easy recording and play- 
back, and probably has useful features for displaying the recorded MIDI data to 
check for errors in performance. The sequencer files may be saved in standard MIDI 
format, or (depending on the package) in the form of a text table of timings and 
other MIDI information, which can then be imported into a spreadsheet or statistics 
package and manipulated to obtain the information required. However, such a so- 
lution becomes extremely unwieldy where there are large amounts of data, or where 
data need to be explored in anything other than the simplest fashion. The researcher 
might want to know the duration of each bar, rather than the durations between suc- 
cessive notes, or the interonset intervals between notes in the left hand only, or the 
durations between successive events in each chord, and might want to do this re- 
peatedly for a number of different performances of the same piece. In such circum- 
stances some automation of data-handling and a degree of "intelligence" may be de- 
sirable, or even essential. Many researchers have written their own software to meet 
such needs, some of which is available as commercial software, shareware, or free- 
ware. POCO, a software environment designed by Desain and Honing (Honing 



Data Collection, Experimental Design, and Statistics 207 

1990, see also p. 81 above), allows complex analyses involving the conversion of MIDI 
data from format to format and the extraction of different aspects of a dataset using 
prewritten routines. Like all specialist software tools, POCO has a steep learning curve, 
but it can repay the effort by reducing the chances of human error, and by enabling 
quite complex re-organizations and analyses of data to be carried out extremely quickly 

There are also instances where data need to be transformed in more substantial 
ways, for example to remove "drift" (such as steadily decreasing tempo in a per- 
formance) or "noise" (random variability). Many software packages allow such de- 
trending or smoothing of data, but it is always wise to consider whether the noise or 
drift can be removed without distorting the results. 

Whenever quantitative analysis is required, a method of transforming test re- 
sults, measurements, or observations into machine-readable form, and software to 
aid in organizing such data, can be a real boon. Whether a spreadsheet or some more 
sophisticated solution is chosen will depend upon the amount of data, the extent of 
transformation required, and whether the data need to be passed on to additional 
software for statistical analysis. 



Descriptive Statistics 

In this section some relatively simple ways of describing different kinds of data will 
be introduced: first, frequency representations and the notion of the normal distri- 
bution; second, ways of expressing the central tendency of sets of data and their dis- 
persion. Rather than provide a detailed guide to these methods, which can be found 
in introductory texts on statistics (e.g., Robson 1983; Miller 1984), I will show how 
they can be applied to musical situations in an informative way. 

Frequencies and Distributions 

Assuming that the data in question are continuous, the first question to ask is whether 
they can be examined using parametric tests. As previously mentioned (p. 202), this 
depends on whether the data are normally distributed. The simplest way of deter- 
mining this is to start by counting the number of occurrences — the frequency — of 
each measurement. For example, imagine that eighty musicians were asked to play 
the same piece of music and, using a stopwatch, the duration of each performance 
was measured. Hypothetical durations are shown in Table 9.3. 

Since every performance in Table 9.3 has a unique duration, each one has the 
same frequency of occurrence (1), and no sense of their distribution can be obtained. 
In order to see a distribution, the data need to be reorganized into categories, show- 
ing the number of performances that fall within each of a series of equal sized dura- 
tional 'bands'. Figure 9.1 shows the data as a histogram, with the number of per- 
formances within each 2.5 second band of durations (from 100 to 140 seconds) 
indicated by each bar. If the data are normally distributed, the most common dura- 
tion category (that with the highest frequency) will be midway between the lowest 
and highest categories, and the frequencies of the other categories will taper off to- 
ward the extremes in a manner similar to that shown by the curve in Figure 9.1. The 
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Table 9.3. Imaginary data (in seconds) for the durations of 80 performances of the same 
piece of music 



Perf 


Duration 


Perf 


Duration 


Perf 


Duration 


Perf 


Duration 


No. 


(s) 


No. 


(s) 


No. 


(s) 


No. 


(s) 


1 


121.90 


21 


125.80 


41 


121.91 


61 


127.00 


2 


121.48 


22 


120.95 


42 


121.06 


62 


121.14 


3 


131.12 


23 


128.56 


43 


131.46 


63 


129.25 


4 


119.70 


24 


119.41 


44 


120.92 


64 


119.53 


5 


120.20 


25 


125.60 


45 


120.38 


65 


125.14 


6 


121.30 


26 


127.70 


46 


120.81 


66 


127.24 


7 


114.75 


27 


116.66 


47 


115.71 


67 


114.86 


8 


117.88 


28 


119.14 


48 


118.37 


68 


119.28 


9 


119.06 


29 


118.26 


40 


119.03 


69 


118.67 


10 


121.57 


30 


122.03 


50 


122.82 


70 


121.91 


11 


124.17 


31 


126.95 


51 


125.36 


71 


127.45 


12 


121.49 


32 


124.21 


52 


121.88 


72 


124.93 


13 


117.79 


33 


116.25 


53 


116.52 


73 


115.39 


14 


123.14 


34 


123.07 


54 


123.16 


74 


123.79 


15 


118.87 


35 


119.74 


55 


118.67 


75 


119.72 


16 


111.05 


36 


111.59 


56 


112.27 


76 


108.82 


17 


113.80 


37 


113.98 


57 


114.13 


77 


113.99 


18 


116.31 


38 


117.55 


58 


115.18 


78 


116.63 


19 


119.83 


39 


121.74 


59 


120.25 


79 


121.52 


20 


124.30 


40 


121.46 


60 


125.45 


80 


121.68 



data resemble the curve quite closely, with the majority of the performances being 
around 120 seconds long; they are slightly "skewed" to the right, in that there are 
more performances that are longer than 120 seconds than shorter, but the imbalance 
is only marginal. Compare this with Figure 9.2, which shows some data that do not 
resemble a normal distribution: here the data are more evenly spread across their 
range, and there are very few performances between 125 and 130 seconds (where a 
peak would be expected in a normal distribution). These data would not be suitable 
for parametric analyses. 

Central Tendencies 

There are three accepted ways of expressing the central tendency of a set of data as 
a single number. The first is by finding the most frequent value, known as the mode. 
The second is by arranging the data in ascending order (also known as rank order) 
and seeing which value lies in the middle of this sequence: this is known as the me- 
dian. The third involves calculating an arithmetic average (i.e., adding the values to- 
gether and dividing by the number of values): this is known as the mean. 

Calculating the mode requires at least two identical values in the data, and this 
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Figure 9.1. A histogram of the data from Table 9.3 with the 
corresponding normal distribution (continuous line) super- 
imposed. 



is one of its limitations as a descriptive statistic, especially if there are small amounts 
of data (a small "sample size"). Even with 80 samples, the data in Table 9.3 have no 
modal value, since no performances were of the same duration. The median (middle 
value) falls between 120.81 (the 40th value) and 120.92 (the 41st). In cases like this, 
where there is an even number of data points and thus no true middle value, the me- 
dian is taken as that value that lies midway between the two central data points 
(120.87 in this case). The most commonly used type of average, however, is the 
arithmetic mean, which in this case is 120.545 (see Robson 1983, on calculating 
both median and mean values). This captures the "central tendency" of the data 
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Figure 9.2. A histogram of data that are not normally 
distributed, with the normal distribution superimposed. 
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quite well, since most of the performances were close to this duration. In other words, 
the mean is a useful way to describe the overall or general response to the task. 

Dispersion 

As well as having a central tendency, a set of continuous data will also have some 
spread of values, a certain dispersion. A very rudimentary way of expressing this 
would be simply to state the difference between the lowest and highest value — the 
range. In the example above, the lowest value was 108.82, the highest 131.46, giv- 
ing a range of 22.64: the fastest performer played the piece 22.64 seconds faster than 
the slowest. This might seem worrying: if two performers can differ by this amount, 
how can one draw any general conclusions about the performances? The answer is 
not only to measure the dispersion of the data but also to compare it to the mean, 
and the most common way of doing this is to calculate the standard deviation. Each 
measurement will differ from the mean by a certain amount, and the standard devi- 
ation expresses the average of these differences, in units that are comparable to the 
individual data (see Robson 1983: 54-60); the calculation can be done automati- 
cally using a scientific calculator or spreadsheet (as can that for the mean). The stan- 
dard deviation of the durations in Table 9.3 is 4.592 seconds, showing that despite 
the large range, most of the performances are not that different in duration: 4.6 sec- 
onds is only around 4% of the mean duration. 

Useful Graphs 

Musical data often consists of successive measurements of some variable over time, 
and in such a case the clearest way to display it is in the form of a line graph or bar 
chart. The upper panel of Figure 9.3 shows the durations of successive bars in a per- 
formance of a 27-bar excerpt from Chopin's Nocturne Opus 27 no. 1, and was pro- 
duced using a statistical package (most spreadsheet packages will also generate 
simple graphs and charts); the horizontal axis represents the succession of bars, the 
vertical axis the duration of each bar in seconds. This type of line graph is a com- 
mon way of representing changes in local timing over the course of a piece. How- 
ever, it is open to misinterpretation. For instance, it is wrong to assume that the lines 
connecting each point represent continuous changes in tempo; the lower panel of 
Figure 9.3 shows the duration in seconds of each triplet quaver for the same stretch 
of music, illustrating that below the level of the bar the individual notes do not fall 
along neat lines or curves. 

Another useful graph for plotting multiple sets of data is the scattergram. Figure 
9.4 shows the durations of Figure 9.3 plotted alongside equivalent data for a second 
performance of the same piece. It can be seen from the line graph that the profiles 
of the two performances are similar, and the scattergram in the lower panel repre- 
sents this more directly by plotting the durations of each bar in the first performance 
against the equivalent duration in the second. (Note, for instance, how the isolated 
value at the top right of the scattergram corresponds to the long duration of bar 26 
in both performances.) If the performances were identical, the points would lie ex- 
actly along a diagonal line from the bottom left to top right; by contrast, if there were 
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Figure 9.3. Data from a performance of a 27-bar excerpt from Chopin's Nocturne, 
Opus 27 no. 1. The upper panel shows the durations of successive bars, while the the 
lower panel shows the duration in seconds of each triplet quaver for the same stretch 
of music. 



no relationship between the performances, no overall pattern would be discernible. 
In the present case, the similar profiles of the two performances in the line graph re- 
sult in a distribution quite close to the diagonal line. This demonstrates a close rela- 
tionship between the two data sets, in other words a correlation between them. 

Bar charts representing the means of different groups of data may also be use- 
ful, but are to be avoided where they suggest differences that are not statistically sig- 
nificant (see below: some form of statistical comparison of means is necessary to es- 
timate such significance). Figure 9.5 shows the mean bar durations of the two 
performances discussed above, the height of the caps (which resemble Ts) on top of 
the bars showing the standard deviations of the two means. This chart immediately 
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Figure 9.4. Data from two performances of a 27-bar excerpt from Chopin's 
Nocturne, Op. 27, no. 1. The upper panel shows raw data for the durations of 
successive bars. The lower panel is a scattergram which plots the duration of each 
bar in the first performance against the equivalent duration in the second. The 
isolated value at the top right of the scattergram corresponds to the long duration 
of bar 26 in both performances. 
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Figure 9.5. Mean bar durations for the two performances shown in Figure 9.4. The 
height of the caps (which resemble Ts) on top of the bars show the standard deviations of 
the two means. 



demonstrates the small absolute difference between the means, and between the 
standard deviations, just as the line graph and scattergram show the similarities at 
the level of individual measurements. Putting all this together, we can say on the 
basis of figures 9.4 and 9.5 that the two performances have not only a similar pat- 
tern of relative timing from bar to bar, but also a similar overall tempo (mean bar du- 
rations), and similar variability between individual bars (standard deviations). 



Comparative Statistics 

Although describing a single set of data may be useful and interesting, most empir- 
ical research proceeds by comparing two or more sets of data. A classical experiment 
tends to compare the same variable under different conditions, for example, by mea- 
suring the durations of performances under examination and rehearsal conditions. 
It may also be necessary to compare different variables: correlating supposedly re- 
lated factors determines whether there is some predictable relationship between 
them, and how strong it is. For example, a correlation of the duration of perform- 
ances with factors such as age, a measure of experience, average heart rate, or anxi- 
ety levels might uncover systematic relationships between these variables. In this 
section a number of different ways of carrying out such statistical comparisons will 
be demonstrated. The calculations will not be shown in detail, but reference will be 
made to standard statistical texts and software. 



Comparison of Means: The t-Test 

In addition to the 80 pianists performing a short piece in rehearsal measured above, 
imagine that the durations of 80 more performances under examination conditions 
were measured. The mean duration of the performances in the rehearsal condition 
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was 120.545 seconds; the mean duration for the exam condition turns out to be 
131.496 seconds. In other words, the exam performances are, on average, about 11 
(10.951) seconds slower than the rehearsals. This seems like a large difference, but 
the absolute difference between the means is not a reliable indicator of a significant 
difference between the two groups. The criterion that many statistical tests use to de- 
termine this is based on the probability that such a difference is due to chance. This 
would clearly be the case if the exam performances were measured again on a dif- 
ferent occasion and their mean duration this time was actually smaller than that of 
the rehearsal performances. Statistical tests allow one to calculate the likelihood that 
the differences are actually related to differences between the two conditions, and are 
not due to random variation. 

The t-test is a relatively simple parametric test that can be used to determine 
this. Although it can be carried out using a calculator, or pen and paper, most sta- 
tistical software packages calculate t-tests quickly and easily; however it is worth- 
while calculating t-tests by hand to grasp how they work before using a statistical 
package (Robson 1983: 76-89 gives an excellent guide). In brief, the t statistic is es- 
sentially an arbitrary index of the size of an effect, and is based on a comparison of 
all the individual data in each of the two conditions being compared (in this case the 
80 durations in rehearsal, and the 80 durations in examination): the larger the ef- 
fect, the larger the value of t. In the present case, the calculation results in a value of 
t = 14.924 (larger values are more likely to be significant), and the probability of the 
difference in means being due to chance is less than 1 in 1000 (p < .001). Normally, 
experimental results are considered significant if this probability value is less than or 
equal to 5 in 100 (p < 0.05), so the result here is extremely reliable. 

As a comparison, a t-test between the two sets of bar durations whose means are 
shown in Figure 9.5 gives a smaller but still significant value of t (3.395; p = 0.0022). 
Hence, the timing of the two performances was reliably different, but only by a tiny 
amount (85 milliseconds). 

Correlation 

I introduced the concept of correlation when discussing the use of graphs, particu- 
larly the scattergram. The relationship between two variables (or the same variable 
measured on different occasions) can be evaluated in a number of different ways. 
The most common parametric method is the product-moment correlation, or Pear- 
son's r. A number of nonparametric alternatives are also available, and Robson 
(1983) discusses both these and parametric correlations in some detail. 

The two panels of Figure 9.4 show a strong correlation between the bar by bar 
timings of the two performances. However, it is necessary to check whether any cor- 
relation is strong enough to be statistically significant — that is, not due to chance. A 
correlation coefficient indicates the magnitude of the relationship between two vari- 
ables and its direction: a value of 1 represents a perfect correlation, a value of - 1 is a 
perfect negative correlation, and implies a completely random relationship between 
the two variables. In the case of Figure 9.4 the correlation is 0.884, confirming the 
strong positive relationship. However, there may be more complex circumstances 
where both positive and negative correlations are involved. For example, Figure 9.6 
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Figure 9.6. Imaginary scores (out of 150) for 10 violin students 
taking a performance exam, plotted against the amount of prac- 
tice (in hours). 



shows imaginary scores (out of 150) for 10 violin students taking a performance 
exam, plotted against the amount of practice (in hours) in the preceding 2 weeks, 
and their level of anxiety (on a self-assessed scale between and 100) just prior to 
the exam. 

There seems to be a negative relationship between anxiety and performance in 
the exam; low values of anxiety correlate with high exam scores. Also, it is easy to 
see that there is a positive correlation between practice and performance: high exam 
marks correlate with more hours spent practicing. In each case we can quantify 
the correlation, and the chances of it being a random effect: in the case of anxiety, 
r = -0.789 (p = 0.0046), whereas in the case of practice r = 0.94 (p < 0.0001). Both 
anxiety and practice, in other words, are strong predictors of examination perform- 
ance. But it is impossible to determine from this analysis which of the two variables 
is influencing performance directly. The most obvious interpretation is that practice 
enhances performance, and so decreases anxiety. However, it might be that practice 
has no direct effect on performance: instead it decreases anxiety, which in turn en- 
hances performance. It could even be that better musicians are simply less anxious 
but practice more. In other words, performance increases at the same time and rate 
as practice, and decreases with anxiety, but it is not clear what the relationship is be- 
tween these three variables. The issue may seem merely academic, but consider the 
following: if one claims that better performance is directly related to practice and 
anxiety, it would follow that both encouraging students who performed worse to 
practice more, and treating their anxiety, should improve their performance. On the 
other hand, if it is actually the case that better musicians perform well for other rea- 
sons, and that their success leads them to practice more and to have lower anxiety 
levels, then no amount of treatment or increased practice will improve the scores of 
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the poorer performers. There are methods for making such causal connections 
clearer, especially where there are more than two variables, some of which will be 
introduced below. 

More Complex Techniques 

Most experimental work uses statistical tests which are rather more complex than 
correlations and t-tests. It is impossible to deal with them in detail here, but it is 
worth mentioning the problems they can be used to solve, and the situations to 
which they have been applied in musical research. We have just seen that correla- 
tions are not necessarily a good way of discovering causal relationships. A set of re- 
lated techniques, including regression and partial correlation, can be used effectively 
in situations where causality or multiple variables are to be analyzed. Applied to the 
example above, it would be possible not only to investigate whether practice has a 
causal relationship with performance, but also to show how strong this relationship 
is when the effects of anxiety have been taken into account. 

Linear regression analysis tests whether two variables are related in a linear fash- 
ion; more exactly, it attempts to fit the data points to a line which has a certain slope. 
When plotted on a scattergram (see Figure 9.4, above) a close correlation appears as 
a diagonal relationship between the two variables, and regression is a means of quan- 
tifying the fit between line and data. Figure 9.7 shows the examination data in the 
same format; the regression is expressed as the square of a simple correlation (R 2 ). 
In this case the fit is very close: R 2 = 0.884. It should be noted that regressions, un- 
like simple correlations, are always positive, so care has to be taken to check the di- 
rection of the relationship. 

It is possible to test whether the fit of the data to this line is significantly better 
than its fit to a horizontal line drawn at the mean of the predicted variable (in this 
case the performance score) on the vertical axis. This will indicate whether the data 
points are better fitted to the sloped regression line, as against the unsloped mean. 
In this case they are, and at a high level of significance (p < 0.0001). 

It is also possible to fit more than one predictor variable (practice and anxiety) 
to the predicted variable (performance), to see what their combined and individual 
predictive power is. This is done by stepwise regression: variables are added one at a 
time to see whether they improve the fit, and by how much. It is even possible to 
use multiple regression to fit the variables at the same time — though such complex 
techniques are not required in the present case. It is, however, always sensible to 
compute partial correlations between the variables, in order to see whether each one 
is significant when the influence of the other variables has been taken into account. 
The partial correlation between practice and performance, subtracting the in- 
fluence of anxiety, is still significant at well below the 0.05 criterion (r = 0.876, 
p < 0.01), but when the influence of practice is taken into account, anxiety is no 
longer significantly correlated with either performance (r = -0.497), or practice 
(r = 0.136); neither of these partial correlations is below the 0.05 probability level. 
In other words, the data do not support the notion that anxiety, in itself, had any 
significant effect on performance in the exam, nor that practice reduced anxiety. 
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Figure 9.7. The correlation between hours 
of practice and anxiety level (taken from the 
data shown in Figure 9.6), showing the corre- 
sponding regression line. 



The only supportable claim is that the amount of practice is related to an increase 
in the exam mark. 

Just as regression analysis extends the notion of correlation, so the pairwise 
comparison of means afforded by the t-test can be extended using analysis of vari- 
ance (AN OVA). Whereas a t-test can only measure the relationship between one 
pair of variables at a time, ANOVA can be used to show differences where there are 
more variables and more than one level within each variable. For example, to ex- 
tend the t-test example above, consider a situation in which student performers 
were playing under three different conditions: one week before an exam, one day 
before, and during the exam itself. Each performance is timed as before. Now a sec- 
ond variable is added: the performers are split into two equal groups of girls and 
boys and we have two variables — gender and time of performance. An ANOVA 
could be used to determine whether there were any significant changes in the du- 
ration of the piece as the exam approached, and whether these changes were the 
same for both girls and boys. 

Even more complex analyses are available where there are many possible rela- 
tionships between variables. For example, factor analysis allows the researcher to see 
which of a large number of continuous variables are most closely related, and to cate- 
gorize these relationships according to their salience. Repp (1992), for example, used 
factor analysis to analyze the commonalities in timing between different performances 
of the same piece of music. One result of factor analysis can be the extraction of a 
number of principal components from a large dataset. In Repp's analysis, three prin- 
cipal components were extracted, which could be seen as representing three clearly 
distinguishable ways in which the pianists timed their performances. This achieves a 
large reduction of complexity, with many potential dimensions becoming just a few. 
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A related technique is multidimensional scaling which has been used to represent 
correlations, or judgments of similarity, between many variables as distances within 
a two- or three-dimensional space. Grey (1977) has used such techniques to repre- 
sent the perceptual distances between instrumental timbres along a number of di- 
mensions, while Kendall and Carterette (1991) have shown how such spaces can be 
used to make subtle predictions about instrumental "blend," and Krumhansl and 
Kessler (1982) have used them to represent distance between keys. 

Most of these more complex techniques are carried out with the help of com- 
puter software, but some understanding of how they are calculated is essential to 
making sense of their results. There are many guides to particular analyses, such as 
Sage Books' guides to ANOVA (Iversen and Norpoth 1976) and linear regression 
(Dunterman 1984); these are excellent starting points for the more mathematically 
minded. It is also helpful to use the sample data and tutorials that come with most 
statistical packages: there is no substitute for seeing how the techniques work on ac- 
tual data. 

Analyzing Frequency Data 

Thus far the examples have all used continuous data. Where frequency data are in- 
volved, cross-tabulation (crosstabs) and the chi-square statistic are essential tools. 
Imagine a situation in which the question is whether students are satisfied with dif- 
ferent aspects of a keyboard harmony course. Each student is asked two questions: 
(1) was the practical teaching satisfactory? and (2) was the theoretical teaching suc- 
cessful? Their responses can be displayed in a two-by-two table as shown in Table 9.4. 
The 10 students answered a total of 20 questions, and if they were responding 
randomly, or were undecided as a group, we would expect on average to see ten "yes" 
responses and ten "no" responses. However, we can see from the table that a clear 
majority were satisfied with the practical teaching, but not with the theoretical 
teaching. The chi-square statistic tests whether the distribution of the frequencies 
across the four cells is nonrandom. In other words it tests to see whether there is a 
significant difference between the observed frequencies and an assumed even distri- 
bution, which in the case of Table 9.4 would mean five responses in each cell of the 
matrix. In this case chi-square = 5.3, which is statistically significant below the 0.05 
level (p = .0213), so we can say that the students were responding differently to the 
two questions. Such cross-tabulation can also work where there are more than two 
response categories (such as "yes," "no," and "maybe," or "agree," "disagree," and 
"not sure") and conditions (a number of different question types), although the re- 
sults can become more difficult to interpret as the matrix gets larger. 

Table 9.4. Imaginary data for 10 students' assessments of whether the practical and 
theoretical components of keyboard harmony tuition were satisfactory. 

count no yes row total 

practical 3 7 10 

theoretical 8 2 10 

column total 11 9 20 
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Experiments and the "Real World" 

At the outset of this chapter, I made some preliminary comments regarding the re- 
lationship between strict experimentation and empirical approaches of a more flex- 
ible and exploratory nature. In conclusion I address this relationship in more detail. 

In the natural sciences, and in experimental psychology, the most common 
method of proceeding is to carry out experiments that test hypotheses. Imagine 
studying the factors that influence musicians' success in a dictation task. Perhaps 
there is a suspicion that musicians who play a keyboard instrument as their princi- 
pal study are better at dictation than those who specialize on another instrument. 
This then, is a clear hypothesis: keyboard players will be better at dictation than 
other instrumentalists. More technically, the experiment is designed around the null 
hypothesis that there is no correlation between keyboard playing and being good at 
dictation. The experiment can disprove the null hypothesis by showing that there is 
such a correlation, but it can never prove the null hypothesis; if no correlation is 
found, that might just be because the experiment was badly designed or executed, 
perhaps because there was some complicating factor that was not allowed for. One 
simple way of setting up an experiment to study this would be to select a number of 
musicians and divide them into two groups, one containing first-study keyboard 
players, the other all the remaining musicians. This division into groups produces 
an independent variable. A dependent variable is also required: for example, a score on 
a standard dictation task. The hypothesis would be tested by comparing the mean 
scores of the two groups using a t-test. 

Simple as the experimental design is, and familiar as the task is, this example il- 
lustrates some of the difficulties in doing controlled research in music. Dividing the 
subjects into two groups assumes that any differences in their performance must be 
explained by whether or not they are keyboard players. But there are all sorts of ways 
in which this assumption might be confounded. It might be the case, for instance, 
that the keyboard players are of a higher overall standard than the other musicians, 
because these others have been drawn from a music school that needs to maintain 
its orchestras and therefore applies less exacting entry standards for orchestral mu- 
sicians; if this were the case, then grouping the students into keyboard and non- 
keyboard players would not be independent of their level of skill, so that the exper- 
iment might end up finding out about differing overall skill or aptitude levels rather 
than differences based on keyboard skills alone. Then again, there will be differences 
between the different subjects, some of which may affect their performance on the 
task: a particular subject may be more easily distracted, or may have received more 
training in dictation than another. These subject variables cannot be removed en- 
tirely, though their effect can be reduced (it may be possible to establish, for ex- 
ample, that neither group is better trained overall). Finally, it is important to ensure 
that situational variables do not distort the results. Imagine that while one group was 
being tested a noisy plane flew overhead, impairing the performance of that group 
as against the other. If the group in question was the keyboard one, such a situational 
variable might obscure their better performance. 

What this means is that not all aspects of musical behavior are best investigated 
through such highly constrained methods. There are many virtues to collecting and 
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analyzing data that are relatively uncontrolled by the researcher; doing an experi- 
ment may artificially constrain the very behavior at issue. While something like dic- 
tation is a task that can easily be transformed into an experimental design, not all 
things one might wish to study are structured in this way. Consider, for example, try- 
ing to study emotional responses to music by asking subjects to decide whether a 
particular piece was "happy" or "sad." Leaving aside the fact that more choices might 
be helpful, such a task is extremely different from how we normally experience and 
respond to the emotional character of music. This is not to say that emotion cannot 
be studied experimentally, but it serves to illustrate the fact that an experiment may 
be so different from our normal way of listening to, or playing, music that the data 
may have little useful to say about real musical experiences. Experimental control re- 
duces the ecological validity of the research: the real world is highly variable and 
complex, and any control you exert runs the risk of reducing this complexity to such 
an extent that the behavior you observe has little connection with that world. 

Another reason to question whether experimental data are always the most use- 
ful is the problem of underlying, unmeasured variables. Returning to our experimen- 
tal example above, it is possible that there is a relationship between keyboard skill and 
a musical skill that was not measured in the experiment. Keyboard players might start 
playing earlier than other musicians, so that the real reason that they perform better 
is that they have simply been playing for longer. To discover whether this is true re- 
quires the collection of data from the real world, through a questionnaire, for example. 

The official name for a study that simply measures and compares existing vari- 
ables is a correlation (which can be tested using a test of association such as Pear- 
son's r), and it allows one to make claims about whether two variables are related. 
As we saw above, correlating two or more types of observation is a powerful way of 
determining whether one has some predictable relationship with the other. Robson's 
(1993) excellent textbook on so-called "real-world" research shows how techniques 
more commonly used in the laboratory can be applied to more realistic situations. 

Such non-experimental data has limitations: it is difficult, for example, to tell 
whether there is any causal relationship between two variables collected in this way. 
An experiment can show whether something the experimenter does (like giving dif- 
ferent instructions to two groups) causes a change in their behavior; she or he con- 
trols the cause, and can be reasonably clear that it precedes its effect. In the end it 
comes down to a matter of combining less formal and more experimental ap- 
proaches. A common way of doing this is to start by collecting some real- world data 
in a relatively informal manner, thus identifying related variables before attempting 
to show clear causation with a more controlled study. Alternatively, a process of tri- 
angulation, in which data and methods of different kinds are simultaneously brought 
to bear on the same set of questions, can be effective. 



Concluding Remarks 

Whether the aim is to study performance, perception, memory or the distribution 
of particular events in scores or recordings, a repertoire of methodological skills al- 
lows the researcher to tackle projects in both a flexible and a systematic manner. The 
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use of established procedures and tests is an integral part of this approach, not only 
because it ensures that data are not misinterpreted, but also because it means that 
results will be accessible to, and understood by, a wide body of potential readers. 
However, one should not be hidebound by a small number of techniques; although 
methods should correctly fit the type of data and the questions asked, it would be a 
shame if research were held back for want of an appropriate ready-made method. 
Although this chapter has done no more than introduce some of the most common 
techniques, I hope to have shown that data analysis is worth knowing about, and 
that such knowledge can be directly applied to the search for answers to musical 
questions. 



Note 



1. Information about the MEDS software, and a downloadable version of the program, 
can be found at http://www.ethnomusic.ucla.edu/systematic/Faculty/Kendall/ 
meds.htm. 



References 

Abel, J. L., and Larkin, K. T. (1990). "Anticipation of performance among musicians: 
Physiological arousal, confidence, and state-anxiety." Psychology of Music 18: 
171-182. 

Besson, M., and Faita, E (1995). "An event-related potential (ERP) study of musical 
expectancy: Comparison of musicians with nonmusicians." Journal of Experimental 
Psychology: Human Perception and Performance 21: 1278-1296. 

Clarke, E. E (1993). "Imitating and evaluating real and transformed musical perform- 
ances." Music Perception 10: 317-341. 

Clarke, E. E, and Krumhansl, C. L. (1990). "Perceiving musical time." Music Perception 7: 
213-252. 

Clarke, E. E, and Windsor, W L. (2000). "Real and simulated expression: A listening 
study." Music Perception 17: 1-37. 

Davidson, J. W. (1993). "Visual perception of performance manner in the movements of 
solo musicians." Psychology of Music 21: 103-113. 

Dunterman, G. H. (1984). Introduction to linear models. Beverly Hills, Calif: Sage. 

Grey, J. M. (1977) "Multidimensional perceptual scaling of musical timbres." Journal of the 
Acoustical Society of America 61: 1270-1277. 

Honing, H. (1990). "POCO: An environment for analysing, modifying, and generating 
expression in music." Proceedings of the 1990 International Computer Music Conference. 
San Francisco: Computer Music Association, 364-368. 

Iversen, G. R., and Norpoth, H. (1976). Analysis of Variance. Beverly Hills, Calif.: Sage. 

Juslin, P N. (1997). "Emotional communication in music performance: A functionalist 
perspective and some data." Music Perception 14: 383-418. 

Kendall, R. A., and Carterette, E. C. (1991). "Perceptual scaling of simultaneous wind 
instrument timbres." Music Perception 8: 369-404. 

Krumhansl, C. L. (1997) "An exploratory study of musical emotions and psycho- 
physiology." Canadian Journal of Experimental Psychology 51: 336-353. 



222 EMPIRICAL MUS1C0L0GY 

Krumhansl, C. L., and Kessler, E. J. (1982). "Tracing the dynamic changes in perceived 

tonal organization in a spatial representation of musical keys." Psychological Review 

89: 334-368. 
Miller, S. (1984). Experimental Design and Statistics. London: Methuen. 
North, A. C, and Hargreaves, D. J. (1996). "The effects of music on responses to a dining 

area." Journal of Environmental Psychology 16: 55-64. 
North, A. C, Hargreaves, D. J., and McKendrick, J. (1999). "The influence of in-store 

music on wine selections." Journal of Applied Psychology 84: 271-276. 
Oppenheim, A. N. (1966). Questionnaire Design and Attitude Measurement. London: 

Heinemann. 
Palmer, C. (1989). "Mapping musical thought to musical performance." Journal oj Experi- 
mental Psychology: Human Perception and Performance 15: 331—346. 
Repp, B. H. (1992). "Diversity and commonality in music performance: An analysis of 

timing microstructure in Schumann's Traumerei." Journal of the Acoustical Society of 

America 92: 2546-2568. 
Repp, B. H. (1997). "The aesthetic quality of a quantitatively average music performance: 

Two preliminary experiments." Music Perception 14: 419-444. 
Repp, B. H. (1999). "Detecting deviations from metronomic timing in music: Effects of 

perceptual structure on the mental timekeeper." Perception and Psychophysics 61: 

529-548. 
Robson, C. (1983). Experiment, Design and Statistics in Psychology . Harmondsworth: 

Penguin. 
Robson, C. (1993). Real World Research. Oxford: Blackwell. 

Seashore, C. E. (1919) Seashore Measures of Musical Talent. New York: Columbia Phono- 
graph Co. 
Seashore, C. [1967 (1938)]. Psychology of Music. McGraw-Hill. (Republished by Dover 

Books, New York, 1967.). 
Shaffer, L. H. (1981). "Performances of Chopin, Bach and Bartok: Studies in motor 

programming." Cognitive Psychology 13: 326-376. 
Sloboda, J. A., Davidson, J. W, Howe, M.J. A. and Moore, D. G. (1996). "The role of 

practice in the development of expert musical performance." British Journal of 

Psychology 87: 287-309. 
Williamon, A. (1999). "The value of performing from memory." Psychology of Music 27: 

84-95. 
Windsor, W. L. (1993). "Dynamic accents and the categorical perception of metre." 

Psychology of Music 21: 127-140. 
Wing, H.D. (1918). Standardised Tests of Musical Intelligence. Windsor: NFER-Nelson. 



INDEX 



Aarden, B., 118 

Abel, J. L., 199,204 

Abeles, H. E, 66 

Abraham, O., 16 

Access problems, 20-22, 26, 28 

Acoustic properties ol sound, 157, 181 

Acoustic wave, 162, 163, 188 

Acoustical analysis, 159-161, 172-179, 

195 
Adorno, T. W, 35, 36, 37, 38, 39, 46 
Aerobics, 30 

African music, 29, 31, 189 
Agawu, K., 29 
Algorithms, 95 
Amateur music making, 19 
Amplitude modulation, 167 
Analysis of music, 3-8, 11, 15-16, 17, 

25, 38, 39-40, 85-86, 89, 96, 97, 

103-123, 127-154, 157-195 
Analysis of Variance (ANOVA), 217, 218 
Anthropology, 17, 32 n. 3, 45 
Appreciation, music, 43 
Arch, melodic, 116-117 
Archives, 16, 30, 31 
Articulation, 81, 86 
Artificial intelligence (AI), 95, 139 
Askenfelt, A., 79,95 
Atkinson, P, 32 n. 5, 46 
Atkinson, R. C, 69 
Atkinson, R. L., 69 
Attack, 172 
Attack quality, 191 
Attali,J.,38, 49 

Auditory stream integration, 184-188 
Auditory stream segregation, 184-188, 

192 
Authenticity, 16,28,32n.8 
Authority, 15, 18, 25 

Babbitt, M., 3, 5, 6, 128, 129, 130 

Balzano, G., 6 

Bandwidth, 175, 192 

Barlow, H., Ill 

Bartok, B., 16 

Barz, G. E, 32 n. 5 



Becker, H. S., 37, 38, 39, 41 

Beiguan (Taiwanese ritual music), 24, 27, 

28 
Belon, P., 3 
Bern, D.J., 69 
Bent, I., 127 
Berec.J., 118, 120 
Berger, B., 41, 42 
Berlyne, D. E., 66, 67 
Besson, M., 199 
Beta Israel, 29 
Bharucha,J.,8, 145 
Bijker, W E., 38 
Bi-musicality 17 
Bmet, A., 77 

Biographical research, 59-60 
Bloome, D., 60 
Boretz, B., 3, 6 
Born, G., 31 

Borthwick, S. J., 63, 64, 65 
Bourdieu,P,41, 45 
Bowen.J. A., 89 
Brackett, D., 180, 181, 182 
Bregman, A. S., 185, 186 
Bresin, R., 87, 95,96 
Bnggs, C. L., 32 n. 5 

Brinkman, A. R., 106, 107, 109, 124 n. 1 
British Library Sound Archive (London), 

88 
Bronson, B. EL, 16 
Bull, M., 49 
Burland, K., 58 
Byng-HalLJ., 63 

Cantometrics, 5, 17, 18, 41 

Carlyon, R. P, 185 

Carterette, E. C, 218 

Castren, M., 131 

Categories, data, 201, 204, 205 

Cattell, R. B., 70 

Cavicchi, D., 49 

Center for Computer-Assisted Research in 

the Humanities (CCARH), 121 
Central tendency, 209-210 
Cents, 16 



223 



224 Index 



Cerulo, K. A., 39, 40, 41 

Chi-square, 218 

Chung, J. W, 66 

Ciancarelli , D . , 71-72 

Citron, M, 43 

Clarke, E., 50, 84, 85, 86, 87, 93, 95, 96, 

97,98, 199,203,204 
Clendenning, J. R, 106 
Clifford, J., 32 n. 3 
Clynes, M„ 95, 97 
Cogan, R., 179, 180, 181, 182 
Cohen, S., 31 
Cohn, R., 149 

Comparative musicology, 15-16, 103 
Concurrent sound organization, 183-185 
Conlon, D. E., 67 

Constructionism, social, 36-38, 47 
Content analysis, 91-92 
Controller (MIDI), 80 
Cook, N., 13 n. 1, 36, 89, 90, 127, 128, 

129, 132, 146, 151, 154, 192 
Cooley, T.J., 32 n. 5 
Cooper, C, 133 
Copyright, 29, 30 
Correlation, 97, 211, 214-216, 217, 219, 

220 
Costa, P T., 73 
Courtier, J., 177 
Crafts, S., 49 
Cross-tabulation, 218 

Dahlhaus, C, 4 

DARMS, 112 

Darwin, C.J., 185 

Data (versus facts), 4 

Data collection, 198, 200, 206 

Data mining, 85 

Data, continuous, 200, 201-202, 204 

Data, nominal, 200, 201-202 

Data, ordinal, 200, 201-202 

Data, recoding, 205 

Data-poor, data-rich, 4 

David, H., 5, 6 

Davidson, J. W, 50, 58, 59, 61, 63, 64, 68, 

79, 87, 91, 92, 93, 94, 99, 199, 200, 

204 
Davis, K. E., 64 
Day, T., 88 
Decay, 187 
DeNora, T., 44, 49, 50, 51 



Densmore collection, 121 
Desam, P, 96 
Deutsch, D., 185 
Digital Signal Processing, 99 
DigmanJ. N., 69 
DiMaggio, P, 43, 66 
Discourse analysis, 64 
Discursive consciousness, 25 
Disklavier, 79, 97, 206 
Dispersion, 210 
Doerksen.J., 130, 131 
DunsbyJ., 132, 151 
Dunterman, G. H., 218 
Dynamics, 81, 84, 86, 87, 90, 97 

Eber, H. W, 70 

Ecological validity 67, 197, 220 

Edstrom, O., 37 

Ehn, P, 151 

Electronic music, 159, 184, 186, 189, 190 

Ellas, N., 41 

Ellis, A. J., 16 

Essen Associative Code (EsAC), 110-112, 

113, 114 
Essen Musical Data Package, 110, 116 
Ethics, 21, 29-31, 72 
Ethnography, 15-32, 44-49, 51-52 
Ethnomusicology 9, 15-32, 44, 49, 64, 

72 n. 2,99, 103 
Evenst, M., 132 
Excel (Microsoft), 150 
Exoticism, 29 
Expression, performance, 84-86, 92, 

93-98 
Expressive timing, 78, 86. See also Timing 

data 
Eyerman, R., 49 

Factor analysis, 217 
Facts (versus data), 4 
Faita, E, 199 
Farnsworth, P. R., 57 
Fauquet, J.-M., 44 
Field notes, 23-26 
Fieldwork, 15-34 
Fingering (piano), 96-97 
Finnas, L., 58 
Flowers, P, 61 
Folk evaluation, 18 
Folklore, 15, 16 



Index 225 



Folksongs, 109-112, 116 

Formant, 174-175 

Forte, A., 6, 117, 118, 129, 130, 131, 132, 

135 
Foucault, M., 3, 13 n. 2 
Fourier series, 165, 167, 173 
Fourier transform, 173-175, 177 
Frequency data, 201, 207, 218 
Frequency modulation, 167 
Friberg,A.,87, 95,96 
Frith, S., 36, 46,48 
Fryden, L., 79, 95, 96 
Frykholm, G., 93 
Fundamental frequency, 163, 165, 167, 

188, 190 

Gabrielsson, A., 77 

Gap-fill, 117 

Garratt, S., 42 

Geertz, C, 32 n. 3 

Generative rules, 95-96 

Genius, 39, 44 

Gergen, K. J., 64 

Gestures, 93 

Gilbert, S., 132 

Gjerdrngen, R., 8, 145, 146, 147 

Goehr, L., 36 

Goldberg, L. R., 70 

Gomant, E., 49 

Good, J. M. M.,68 

Gothenburg group, 37 

Gramophone, 43 

Grantham, D. W, 189 

Graphic user interfaces, 122 

Graphs, bar chart, 210 

Graphs, line graph, 210 

Graphs, scattergram, 210, 212, 216 

Green, L., 42 

Grey, J. M, 190, 199,204,218 

Griswold, W, 47 

Grounded theory, 37, 64 

Grouping processes, 184, 185 

Guck, M., 3, 5 

Hall, S., 48 

Halliday, M. A. K., 139 

Hammersley, M., 32 n. 5, 46 

Hargreaves, D. J., 57, 66, 203 

Harmonicity 185, 186 

Harmonics, 165, 172, 186, 187, 190 



Harre, R., 57, 58, 92 

Heinich, N., 44 

Henmon, A., 38, 43, 49 

Herbst, E.,31 

Hilgard, E. R.,69 

vonHippel, P., 117, 124 n. 13 

Hippies, 46 

Histogram, 207, 209 

Historically informed performance, 27-28 

Hofstetter, F, 6, 120 

Homology, 36, 37, 40, 46, 48, 49 

Hood, M., 17 

von Hornbostel, E. M., 16 

Howe, M.J. A., 59, 61, 62, 63, 65, 200, 

204 
Hughes, M., 6, 7, 107 
Hughes, T. P, 38 
Huju (Shanghai opera), 18 
Humdrum Toolkit, 103, 113-123, 192 
Huron, D., 4, 6, 8, 9, 11, 112, 113, 

116, 117, 118, 120, 123, 124 n. 9, 

124 n. 11, 124 n. 13, 185, 191, 192, 

193, 194 

Idiomaticism (instrumental), 118-120 
Impact of researcher on researched, 20, 

22,24, 26, 29, 31,32 n. 3 
Implication-realization analysis, 117, 146 
Improvisation, 91-92 
Informant, 9, 22, 50 
Inharmonicity 165, 167, 170, 191 
Insiders, 9, 17-18,46,50,64 
Intensity, 157, 179, 199. See also Velocity 

data 
Interonset interval (IOl), 80-81, 86, 89, 

206 
Interpretative phenomenological analysis, 

64,65 
Interview, 10, 17, 20, 22, 23, 26-27, 

50-51,57-58,60-68 
IRCAM, 31 
Iversen, G. R., 218 

Jackendoff, R, 7, 12, 132, 133, 134, 135, 

137, 138, 145 
Jackson, B., 32 n. 5 
Jackson, M., 32 n. 3 
Jamieson, A., 49 
Jeppesen, K., 146 
Johnson, J., 36 



226 Index 



Johnson, P., 180, 181, 182 
Juke Box Jive, 27-28 
Juslin, PN., 87, 95,96, 204 

Kaemmer.J. E., 25 
Kanellopoulos, P A., 58 
Kassler, M., 6 
Keil, C.,49 
Kemp, A. E., 70 
Kendall, R. A., 218 
Kern, 113-115, 116 
Kessler, E.J., 199,204,218 
Keyboard, 78, 80, 86 
Kingsbury, H., 32 n. 4 
Kline, P, 69, 70,71 
Knorr-Cetina, K, 38 
Komar, A., 5 
Kornstadt.A., 122, 123 
Krumhansl, C. L., 7, 8, 117, 149, 199, 
203,204,218 

van Langenhove, L., 58, 92 

Lanza, J., 51 

Larkm, K. T., 199,204 

Latour, B., 38 

Leech- Wilkinson, D., 4 

Lehmann, A., 59 

Leman, M., 144 

Leppert, R., 36, 49 

Lerdahl, E, 7, 8, 9, 12, 132, 133, 134, 

135, 137, 138, 145 
Lester, J., 135 
Listening, 19, 20, 36, 43, 66, 91, 117, 

132, 148, 159, 188, 198-204 
Listening test, 199 
Litle, P, 72 
Lomax, A., 5, 17, 41 
Loudness, 185, 188, 189, 190 

von Maanen, J. , 32 n. 3 
Maisonneuve, S., 43 
Manning, P, 80 
Mapping, 118 
Marcus, G. E., 32 n. 3 
Martin, P., 10, 37, 38 
Martin, S., 89 

Mass-Observation Project, 60 
McAdams, S., 185, 186, 190, 192 
McClary, S.,36, 37 
McCrae, R. R., 71 



McKendrickJ., 203 

McNeill, W H., 50 

McPherson, G. E., 77 

Mean (average), 208-210, 213-214 

Meaning, musical, 47, 49 

Median (average), 208-209 

Medieval music, 4 

MEDS, 206, 221 n. 1 

Melograph, 31 n. 1 

Mesiti, M. R., 106, 107, 109 

Meyer, L. B., 8, 133, 140 

Middleton, R., 49 

MIDI, 79-81, 86-87, 93, 97, 99, 112, 

116, 122, 199,205-207 
Miller, S., 207 

Mistakes (versus expression), 85 
Mode (average), 208-209 
Models (of performance), 95-98 
Modulation, 167 
Moore, D. M, 61,200,204 
Moores, S., 47 
Morgenstern, S., Ill 
Moss, L., 7 

Motive (melodic), 117-118 
Motor skill, 77 

Movement (of body), 50-52, 93-95 
Multidimensional scaling, 190, 218 
Murnighan, J. K, 67 
MuseData, 121 
Music, definitions of, 19, 27 
Musical ability, 61-63 
Musical instruments, 15-16, 17, 19 
Musical structure as social structure, 28 
MUSTRAN, 121 
Myers, EL, 23 

Narmour, E., 8, 9, 117, 140, 146 

Nationalism, 16 

Nattiez,J.-J., 142, 143 

Negus, K, 42 

Neo-Riemannian theory, 149 

Nettl, B.,32n. 4 

Neural networks, 142-146 

New Paradigm Research, 57-58 

Nightclubs, 22 

Noise, 167, 172, 191 

Nonperiodic signal, 173, 174, 175, 176 

Nonverbal communication, 68, 77 

Normal distribution, 202, 207 

Normalized IOI, 83 



Index 227 



Norpoth, H.,218 

North, A. C, 57, 66, 203 

Notes inegales, 84 

Null hypothesis, 5, 13 n. 4, 219 

O'Neill, S., 42 

Onset synchrony, 185, 186 

Oppenheim, A. N., 61, 63, 200 

Optoelectronic motion analysis, 93 

Organograms, 17 

Osborn, M., 61 



Pople, A., 127, 128, 149 
Povel, D.-J..78 
Practical consciousness, 25 
Preference rules, 133-134 
Preparation for research, 20-22, 197 
Principal components (statistical), 217 
"Production of culture" approach, 42 
Psychodynamic theory, 70 
Psychology of music, 8, 57, 77-78, 98, 

197 
Psychometric test, 69-72, 200, 203 



Palmer, C, 85, 199,203 
Parncutt, R., 77, 93, 96, 191 
Partials, 165, 167 
Participant-observation, 9, 10, 15-32, 46, 

50, 58, 64 
PaslerJ., 37 

Pearson's r, 214. See also Correlation 
Pegg, C, 66 
Penfold, R. A.,80 
Perception, 7, 12, 98, 132, 141-145, 157, 

159-160, 183-195, 199,204 
Perceptual analysis, 183-191, 195 
Perceptual fusion, 160, 184, 185-186 
Performance, 9, 10-11, 19-21, 24-28, 

39, 50, 57, 67-69, 77-99, 120-121, 

180-181, 198-200,203,206, 

210-217 
Performance, social factors, 67 
Performance, string quartet, 67-69 
Performance, visual component, 92-95 
Performance as research, 17, 18, 20, 22 
Performance studies, 77 
Periodic signal, 163, 165, 168, 173 
Periodicity, 162, 168 
Personality 69-70, 71 
Peterson, R., 39, 41, 42, 43, 45 
Phonograph, 15, 16. See also Gramophone; 

Sound recording 
Photography, 27-30 
Phrase structure, 96, 97 
Piano camera (Seashore's), 79 
Piano roll, 77 
Pinch, T., 38 

Pitch, 165, 167, 180, 185, 188, 190 
Plaine and Easie Code, 121 
Plomp, R., 190 
POCO, 81, 86, 206-207 
Point-light displays, 93-95 



Qualitative data, 91-92, 95, 205 
Qualitative methods, 57-58, 65, 92, 99 
Quantitative data, 198 
Quantitative methods, 45, 57-58, 65, 92, 

99, 197 
Questionnaires, 22, 60, 69-70, 71, 199, 

200 
Questionnaires, return rates of, 63 
Qureshi, R. B., 27 

Raekallio, M., 93, 96 

RahnJ., 131, 145 

Randel, D., 36 

Range, 210 

Ranking, 98 

Rasch, R. A., 185 

Rating scale, 58 

Rawlings, D., 71-72 

Realism, 197 

Reck, D., 26 

Recordings, 15, 27-30, 42, 57, 67, 

87-90,91,93-94,99, 181 
Reductionism, 6, 57, 198 
Regression, 216 
Regression, linear, 216, 218 
Regression, multiple, 216 
Regression, stepwise, 216 
Repetition (and performance), 85 
Repp, B. H., 79, 88, 89, 97, 99, 199, 203, 

217 
Reputation, construction of, 43-45 
Research log, 23 
Research plan, 20, 21 
Resonance, 190 
Rink, J., 77, 79 
RISM, 121 
Robson, C, 91, 92, 207, 209, 210, 214 



228 Index 



Rorschach, H., 70 
Rosen, C, 44 
Rothgeb.J., 7 
Roughness, 191, 192 
Rowe, R., 144 
Runeson, S., 93 
Russ, M., 151 
Russell, P. A., 66 

Sanjek, R., 32 n. 5 

Sansom, M. J., 91, 92 

Schachter, C, 7 

Schaffrath, H., 110, 111, 113, 124 n. 3 

Schellenberg, E., 146 

Schneider, A., 16 

Schools, research in, 21, 22, 23 

Schutz, A., 50 

SCORE, 121 

Script theory, 63-64 

Scutt, S.,58, 59 

Searle, H., 60 

Seashore Measures of Musical Talent, 200 

Seashore, C. E., 78, 79, 84, 199, 200, 203 

Seeger, A., 19, 25 

Seeger, C, 16, 17 

Segmentation, 130-131, 135-138, 142, 

148 
Self/other distinction, 18-19 
Sequencer, 80, 86, 206 
Sequential sound organization, 185 
Set-class theory, 6, 129, 149 
Shaffer, L. H., 78, 79, 199, 203 
Sharp, C. J., 16 
Shelemay, K. K., 29 
Shepherd, J., 38 
Sheridan, D., 60 
Shops, shoppers, 51 
Signalyze, 89 

Significance (statistical), 214, 218 
Simkus, A., 45 
Simms, B., 130, 135 
Sinusoid, 163-165, 167, 174, 175 
Sinusoid, damped, 174-176 
Sleator, D., 148, 149 
Slobin, M., 26 
Sloboda, J., 49, 59, 60, 61-62, 63, 64, 65, 

93, 96, 200, 204 
Smith, E. E., 69 

Smith, J. A., 58, 60, 61, 64, 65, 92 
Snarrenberg, R., 127 



Social constructionism, 36-38, 47 

Social interactionism, 37-38 

Social psychological research, 57 

Sonogram, 177 

Sonograph, 17, 31 n. 1 

Sound documents, 157 

Sound recording, 15-16, 17 

Sound recording as research tool, 27-30 

Sound signals, 161, 167, 172, 173, 177 

SoundEdit, 89 

Spatial localization, 185, 188, 189 

Spectral centroid, 191 

Spectral content, 165, 177, 182 

Spectral envelope, 191 

Spectral representation of sound, 

163-165, 167, 171, 173, 177, 180, 

181, 182, 191 
Spectrogram, 157-161, 172, 173, 177, 

179, 180, 188, 189 
Spectrogram, neural, 183 
Spectrotemporal continuity, 187 
Spectrum, 165, 175-178, 190, 191 
Spectrum analysis, 180 
Spectrum photo, 179-182 
Staff notation, 17 
Staff notation, modified, 15 
Standard deviation, 210, 211, 213 
State-Trait Anxiety Inventory (STAI), 200, 

204 
Statistics, comparative, 213-218 
Statistics, descriptive, 207-213 
Statistics, nonparametric, 202 
Statistics, parametric, 202-203 
Steward, S., 42 
Street, B., 60 
Stumpf, C, 16 
Subotnik, R., 36 
SundbergJ., 79, 95,96 
Sustain, 162 

"Tap along" method, 89-90 
Taste, 45 

Tatsuoka, M. M., 70 
Taylor, W M., 122 
Temperley, D., 148, 149 
Tempo map, 82-83,90 
Temporal envelope, 172-173, 175 
Temporal representation of sound, 

162-165, 167, 171-172, 179-182, 

191 



Index 229 



Temporal window, 167, 172-175, 177 

TenneyJ., 140, 141, 142, 143 

Thelen, E., 93 

Themefinder, 122 

Theory and practice, 20, 25 

Thompson, W. E, 97 

Timbre, 157, 163, 167, 180-182, 185, 

186, 188, 190-193, 199,218 
Timing data, 199. See also Expressive 

timing 
TitonJ. T.,26 

Todd, N.P, 79, 95, 97, 100 n. 9 
Tonalities software, 149-151 
Tota, A.,47 
Transcription, 15, 17 
Tremolo, 167, 185 
Triangulation, 64, 220 
Trocco, E, 38 
Tsang, L., 191, 192, 195 
t-test, 213-214, 219 

Ulrich, B. D, 93 
UNIX, 113, 124 n. 8 
Useem, M., 66 

Value judgment, 25, 44 
Variables, 200-202, 219-220 
Variables, continuous, 202 
Variables, dependent, 58, 219 
Variables, independent, 58, 219 
Variables, quantitative, 200 
Variables, situational, 219 
Variables, subject, 219 
Velocity data, 81. See also Intensity data 
Vibrato, 167, 181, 182, 185, 186 



Video, 27-30, 68, 93, 94, 199 
Visual Basic, 150 
Visualization, 106-107 
Voice-leading, perceptual principles of, 
191-194 

Walser, R., 49 

Waveform, 165, 167, 168, 172, 183 

Weber, M., 35 

Weber, W, 43 

Weighted scale, 16 

Weissweiler, E., 60 

Well-formedness rules, 133 

White noise, 167, 171 

Whittall, A., 132 

Widmer, G., 85 

Wild, J., 124 n. 11 

Williamon, A., 203 

Willis, P, 46, 47, 48, 49 

Windsor, W L., 95, 97, 98, 200, 203, 204 

Wing, H. D., 203 

Winograd, T, 139, 140, 141 

Winold, H., 93 

Witkm, R., 36 

Wolff, J., 39 

Woolgar, S.,38 

"Work," musical, 36 

Wright, J. K., 186 

Yeston, M., 7 
Yorkshire, music of, 21 
Yung, B., 27 

Zolberg, V, 39 
Zuckerman, M., 72 



